[totem-pl-parser/gnome-2-28] 2.28.3

From: Bastien Nocera <hadess src gnome org>
To: commits-list gnome org
Cc:
Subject: [totem-pl-parser/gnome-2-28] 2.28.3
Date: Wed, 12 May 2010 12:37:28 +0000 (UTC)
commit 654f42f5e85c06c811a9e8f1c948eed3135fd5d8
Author: Bastien Nocera <hadess hadess net>
Date:   Wed May 12 13:29:18 2010 +0100

    2.28.3
    
    Add missing HackerMedley test

 NEWS                       |    9 +
 configure.in               |    2 +-
 plparse/tests/HackerMedley |  434 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 444 insertions(+), 1 deletions(-)
---
diff --git a/NEWS b/NEWS
index 0f2c124..9637682 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,14 @@
 New features and significant updates in version...
 
+2.28.3:
+* Fix compilation on non-GNU platforms
+* Add introspection support
+* Fix parsing of a number of Podcasts, including possible crashers
+* Fix out-of-order ASX playlists
+* Fix memory leak when parsing directories
+* Fix parsing of playlists on HTTP servers when they
+  don't match the suffix used (eg. PHP page giving an XSPF playlist)
+
 2.28.2
 * Add support for subtitle properties in SMIL files
 * Make totem-pl-parser's XML parsing thread-safe
diff --git a/configure.in b/configure.in
index 7c956fe..baa0b60 100644
--- a/configure.in
+++ b/configure.in
@@ -2,7 +2,7 @@ AC_PREREQ(2.62)
 
 m4_define(totem_version_major, 2)
 m4_define(totem_version_minor, 28)
-m4_define(totem_version_micro, 2)
+m4_define(totem_version_micro, 3)
 
 AC_INIT([totem-pl-parser],
         [totem_version_major.totem_version_minor.totem_version_micro],
diff --git a/plparse/tests/HackerMedley b/plparse/tests/HackerMedley
new file mode 100644
index 0000000..b1bea89
--- /dev/null
+++ b/plparse/tests/HackerMedley
@@ -0,0 +1,434 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<?xml-stylesheet type="text/xsl" media="screen" href="/~d/styles/rss2enclosuresfull.xsl"?><?xml-stylesheet type="text/css" media="screen" href="http://feeds.feedburner.com/~d/styles/itemcontent.css";?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/"; xmlns:wfw="http://wellformedweb.org/CommentAPI/"; xmlns:dc="http://purl.org/dc/elements/1.1/"; xmlns:atom="http://www.w3.org/2005/Atom"; xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"; xmlns:slash="http://purl.org/rss/1.0/modules/slash/"; xmlns:media="http://search.yahoo.com/mrss/"; xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"; xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0"; version="2.0">
+
+<channel>
+	<title>Hacker Medley</title>
+	
+	<link>http://hackermedley.org</link>
+	<description>A podcast for curious hackers</description>
+	<lastBuildDate>Tue, 30 Mar 2010 18:13:31 +0000</lastBuildDate>
+	<generator>http://wordpress.org/?v=2.9.1</generator>
+	<language>en</language>
+	<sy:updatePeriod>hourly</sy:updatePeriod>
+	<sy:updateFrequency>1</sy:updateFrequency>
+			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom"; rel="self" type="application/rss+xml" href="http://feeds.feedburner.com/HackerMedley"; /><feedburner:info uri="hackermedley" /><atom10:link xmlns:atom10="http://www.w3.org/2005/Atom"; rel="hub" href="http://pubsubhubbub.appspot.com/"; /><media:copyright>Copyright Hacker Medley. Licensed CC-BY-SA.</media:copyright><media:thumbnail url="http://hackermedley.org/images/splash.jpg"; /><media:keywords>hacker,software,programmer,code,linux,programming,open,source,geek,geeky</media:keywords><media:category scheme="http://www.itunes.com/dtds/podcast-1.0.dtd";>Technology/Tech News</media:category><itunes:owner><itunes:email>nat nat org</itunes:email><itunes:name>Nat Friedman and Alex Graveley</itunes:name></itunes:owner><itunes:author>Nat Friedman and Alex Graveley</itunes:author><itunes:explicit>no</itunes:explicit><itunes:image href="http://hackermedley.org/images/splash.jpg"; /><itunes:keywords>hacker,software,programmer,code,lin
 ux,programming,open,source,geek,geeky</itunes:keywords><itunes:subtitle>A short podcast for curious hackers.</itunes:subtitle><itunes:summary>Hacker Medley is a short podcast for curious hackers.  Our goal is to talk about the cool things we've learned that we love explaining to our friends.  We're programmers, so software and technology will probably be our meat and potatoes. But we won't restrict ourselves! Any subject that might interest a hacker is fair game.</itunes:summary><itunes:category text="Technology"><itunes:category text="Tech News" /></itunes:category><item>
+		<title>Episode 4: Humans Only</title>
+		<link>http://feedproxy.google.com/~r/HackerMedley/~3/bSbZTMCHHB0/96</link>
+		<comments>http://hackermedley.org/archives/96#comments</comments>
+		<pubDate>Thu, 18 Mar 2010 03:32:48 +0000</pubDate>
+		<dc:creator>nat nat org (Nat Friedman and Alex Graveley)</dc:creator>
+				<category><![CDATA[Uncategorized]]></category>
+
+		<guid isPermaLink="false">http://hackermedley.org/?p=96</guid>
+		<description><![CDATA[
+For our fourth episode, we decided to try making a long, in-depth show about those squiggly word puzzles you find all over the internet, called CAPTCHAs. This is our first show that contains interviews, including of the happy fellow you see above, Dr. Andrei Broder, the Chief Scientist at Yahoo!. You&#8217;ll hear from him quite [...]]]></description>
+			<content:encoded><![CDATA[<p style="text-align: center;"><img class="aligncenter" title="Andrei Broder" src="http://hackermedley.org/images/andreibroder.jpg"; alt="" width="500" height="375" /></p>
+<p>For our fourth episode, we decided to try making a long, in-depth show about those squiggly word puzzles you find all over the internet, called CAPTCHAs. This is our first show that contains interviews, including of the happy fellow you see above, Dr. Andrei Broder, the Chief Scientist at Yahoo!. You&#8217;ll hear from him quite a bit in this episode.</p>
+
+<p>This show is almost 50 minutes long. We hope you enjoy it. Right now we&#8217;re thinking about this as sort of a special occasion. Most of our shows will likely be shorter &#8212; mostly because they&#8217;re easier to make (Nat spent over 100 hours on this one). Unless you tell us long is the way to go!</p>
+<p>And on that note, we&#8217;d love to get your feedback on this show in the comments below. Constructive criticism and gushing encouragement are all welcome!</p>
+<p>If you want to learn more about the topics we discussed, here are some handy links.</p>
+<p>The Interviewees</p>
+<ul>
+<li><a href="http://research.yahoo.com/Andrei_Broder";>Dr. Andrei Broder</a>, Chief Scientist at Yahoo&#8217;s Advertising Technology Group</li>
+<li><a href="http://bmaurer.blogspot.com/";>Ben Maurer</a>, co-founder of reCAPTCHA</li>
+<li><a href="http://research.microsoft.com/en-us/um/people/kumarc/";>Dr. Kumar Chellapilla</a>, Scientist at Microsoft Research</li>
+<li><a href="http://userscripts.org/scripts/show/38736";>Shaun Friedle</a>, creator of the Megaupload autofill CAPTCHA greasemonkey script</li>
+</ul>
+<p>CAPTCHA basics</p>
+<ul>
+<li>TheÂ <a href="http://www.captcha.net/";>official CAPTCHA website</a></li>
+<li>Alan Turing&#8217;s 1950 paper,Â <a href="http://loebner.net/Prizef/TuringArticle.html";>Computer Machinery and Intelligence</a>, wherein he poses the Turing Test</li>
+<li>A nice littleÂ <a href="http://www2.parc.com/istl/projects/captcha/history.htm";>summary of the history of CAPTCHA</a></li>
+<li>A longÂ <a href="http://www.wired.com/techbiz/it/magazine/15-07/ff_humancomp?currentPage=all";>Wired article about CAPTCHA and Luis von Ahn&#8217;s GWAP project</a></li>
+<li><a href="http://recaptcha.net";>reCAPTCHA </a>- solve spam, read books</li>
+<li><a href="http://www.pcworld.com/article/140507/beware_the_cyberlover_that_steals_personal_data.html";>CyberLover</a> &#8211; the bot that steals personal information</li>
+<li>The Photoshop Phriday competition to makeÂ <a href="http://www.somethingawful.com/d/photoshop-phriday/recaptcha-paint.php?page=5";>funny pictures from reCAPTCHA word combinations</a></li>
+<li>A funny <a href="http://xkcd.com/632/";>xkcd about CAPTCHA and turing tests</a></li>
+<li>The <a href="http://www.google.com/patents?vid=USPAT6195698";>CAPTCHA patent</a></li>
+<li><a href="http://taylorhayward.posterous.com/3d-images-as-a-captcha";>Taylor Hayward&#8217;s work on 3D images as CAPTCHAs</a></li>
+</ul>
+<p>Algorithmic attacks on CAPTCHA</p>
+<ul>
+<li><a href="http://research.microsoft.com/en-us/um/people/kumarc/pubs/chellapilla_nips04.pdf";>Kumar Chellapilla&#8217;s paper on breaking CAPTCHAs</a> at Microsoft Research</li>
+<li>Shaun Friedle&#8217;s megauploadÂ <a href="http://ejohn.org/blog/ocr-and-neural-nets-in-javascript/";>autofill CAPTCHA greasemonkey script</a> as broken down in John Resig&#8217;s blog</li>
+</ul>
+<p>Convolutional Neural Networks</p>
+<ul>
+<li><a href="http://www.youtube.com/watch?v=IOHayh06LJ4";>Video of the Hubel/Wiesel cat brain experiments</a>. Amazing example of reverse engineering.</li>
+<li><a href="http://yann.lecun.com/exdb/lenet/index.html";>Yann LeCun&#8217;s LeNet-5, a convolutional neural network</a>. LeCun is one of the originators of the technique.</li>
+<li><a href="http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf";>A great paper introducing convolutional neural networks</a></li>
+<li><a href="http://research.microsoft.com/pubs/68920/icdar03.pdf";>Convolutional Neural Networks best practices</a>, a Microsoft Research paper from Patrice Simard</li>
+</ul>
+<p>CAPTCHA bypass services (aka CAPTCHA farms)</p>
+<ul>
+<li><a href="http://blogs.zdnet.com/security/?p=1835";>Inside India&#8217;s CAPTCHA solving economy</a>, a ZDNet article</li>
+<li><a href="http://decaptcher.com";>Decaptcher</a></li>
+<li><a href="http://www.spyderx.com/order_captcha_credits.php";>Spyder CAPTCHA assist for myspace</a></li>
+</ul>
+<p>This episode contains two songs from <a href="http://magnatune.com/artists/ejp";>Eternal Jazz Project</a>, a Swedish jazz band that released some of their music under the Creative Commons <a href="http://creativecommons.org/licenses/by-nc-sa/1.0/legalcode";>BY-NC-SA</a> license on magnatune. This episode is distributed under the same license.</p>
+<div class="transcript">
+<h4>Transcript</h4>
+<div class="content">
+<div class="ts">00:00:00</div>
+<div class="music"></div>
+<div class="line"><span class="nick AndreiBroder">Broder:</span> There was a procedure called add URL, where you would come to a search engine and you would say, you know, here is the pages I just made. But anyway we had this problem and, of course, there was spammers and there were people that were adding the same page millions of times and wrote little scripts to add their pages. So we had somehow to slow the spammers. And this is how we came up with the idea that we need a test to distinguish between spammers and humans.</div>
+<div class="music"></div>
+<div class="ts">00:00:46</div>
+<div class="line"><span class="nick nat">Nat:</span> That was Dr. Andrei Broder, the Chief Scientist at Yahoo!, discussing his time at Altavista in 1997, when he led the team that invented a little thing called CAPTCHA.</div>
+<div class="line"><span class="nick alex">Alex:</span> And CAPTCHAs are the subject of our program today. We&#8217;re going to be exploring the state of the art in CAPTCHA generation and circumvention</div>
+<div class="line"><span class="nick nat">Nat:</span> I&#8217;m Nat Friedman, reporting from the Bavarian capital of Munich.</div>
+<div class="line"><span class="nick alex">Alex:</span> And I&#8217;m Alex Graveley, reporting from sunny, cloudy, cold San Francisco.</div>
+<div class="line"><span class="nick nat">Nat:</span> And this is Hacker Medley, the podcast for curious hackers.</div>
+<div class="music"></div>
+<div class="ts">00:01:28</div>
+<div class="line"><span class="nick nat">Nat:</span> Let&#8217;s see here. Word verification. Type the characters you see in the picture below.  Okay.  C-O, I think that&#8217;s a U.</div>
+<div class="line"><span class="nick alex">Alex:</span> Wait, is this supposed to be a word or is this just letters?</div>
+<div class="line"><span class="nick nat">Nat:</span> I think it says [coralia].  Is that a word?</div>
+<div class="line"><span class="nick alex">Alex:</span> I don&#8217;t know.</div>
+<div class="line"><span class="nick nat">Nat:</span> I think it&#8217;s just sort of just random letters that are pronounceable.  Okay.  I think it&#8217;s C-O-U, and I think there&#8217;s an R like tucked in there and that&#8217;s, wait that might not be an A actually.  I think that, yeah, that&#8217;s an A.  And then this is either a B or an LE.</div>
+<div class="ts">00:02:02</div>
+<div class="line"><span class="nick alex">Alex:</span> And here Nat is trying to solve a CAPTCHA, one of those squiggly word puzzles that you see all over the internet, where you have to type in the words that you see in order to enter a blog, comment or create a new mail account or even participate in an online poll.</div>
+<div class="line"><span class="nick nat">Nat:</span> The estimates are that we are, human beings as a species, are solving over 200 million CAPTCHAs every single day, but the very first CAPTCHA was implemented at AltaVista back in 1997. I interviewed Dr. Broder at his office in Santa Clara and asked him to tell us how it happened.</div>
+<div class="line"><span class="nick AndreiBroder">Broder:</span> I think from the very beginning we had kind of an idea that the problem has to be some kind of a pattern recognition problem because this is one area where humans are much better than machines.  And at some point it sort of started from, I think some lunch discussion and we were pointing out, someone was pointing out machines are not yet incredibly good at playing chess.  How come humans cannot make so much computation are good at chess and it&#8217;s all about pattern recognition.  So we knew that we need a pattern recognition problem.  And then we came with this one.</div>
+<div class="ts">00:03:07</div>
+<div class="line"><span class="nick nat">Nat:</span> How did you come up with the algorithm for distorting text?</div>
+<div class="line"><span class="nick AndreiBroder">Broder:</span> That one is a lot easier to tell you how we decided what things are because actually I had a scanner at home, and scanners were not so cheap as today, and I had a scanner, and I believe was made by Brothers but I&#8217;m not 100% sure, and the scanner came with a manual and they also had some OCR software, which came with the scanner.  And pretty much I looked in the manual and everything in the manual that they said it&#8217;s bad for OCR.</div>
+<div class="line"><span class="nick Andrei">Andrei:</span>  We decided why don&#8217;t we make it.  So one of the things that we&#8217;re saying, well it&#8217;s bad if the letters are misaligned, so we said okay they should be misaligned.  And it&#8217;s bad if you use multiple fonts, so we said okay use multiple fonts.  So it was all there.</div>
+<div class="ts">00:04:07</div>
+<div class="line"><span class="nick nat">Nat:</span> That&#8217;s a pretty interesting story, huh?</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah, I love those old stories of like hacker epiphanies that solve really complex problems.  The funny thing is that search engines today don&#8217;t even use this scheme anymore, they just use PageRank, which crawls the whole web.  But instead, CAPTCHAs have turned out to be incredibly valuable for locking out spammers from pretty much all aspects of the internet.</div>
+<div class="line"><span class="nick nat">Nat:</span> You know, what&#8217;s kind of amazing to me is that these guys, this little team at AltaVista 12 years ago, they came up with this human detection technique and it&#8217;s pretty much exactly what we&#8217;re using today.</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah, I mean it looks pretty much the same to us but it is somewhat different, like the state-of-the-art has pushed these things towards being much harder for computers to solve.</div>
+<div class="ts">00:04:54</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah, that&#8217;s true.  I mean pattern recognition techniques and AI and computer vision have advanced a lot since then. And actually, that&#8217;s a good point, that kind of brings us to why Alex and I think CAPTCHAs are so interesting.  That little image, that little rectangle of distorted text on your web browser, that is kind of like a window into the world of artificial intelligence and how it relates to human capabilities.</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah, and specifically it&#8217;s just like this really interesting set of problems which are sort of described in that they&#8217;re tests that computers can generate and grade the answer to but which they can&#8217;t themselves solve very easily but that humans can solve really quickly.</div>
+<div class="line"><span class="nick nat">Nat:</span> So here are the criteria.  In order to be a viable CAPTCHA, a test has to be something that&#8217;s beyond the frontier of current artificial intelligence, but well within the capabilities of even really, really average people. So in a certain way, the set of all viable CAPTCHAs describes the ways in which people are still better and more capable than computers.</div>
+<div class="ts">00:05:53</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah, and it shows you sort of the places where AI still has to grow and the limitations of what we can do, at least with regards to image recognition.</div>
+<div class="line"><span class="nick nat">Nat:</span> That&#8217;s a good point. But, of course, bad news is that AI is getting smarter and we&#8217;re not. So, you know, for the time being, at least when it comes to recognizing distorted text, we&#8217;re still well beyond computers but there&#8217;s no reason it&#8217;s going to stay that way stay forever.</div>
+<div class="line"><span class="nick nat">Nat:</span> Actually, Alex, by the way, the idea of CAPTCHA goes back to an earlier concept called a Turing Test.</div>
+<div class="line"><span class="nick alex">Alex:</span> I&#8217;ve heard of Turing Tests but it&#8217;s funny, I didn&#8217;t know that CAPTCHA stood for a Completely Automated Public Turing Test to tell Computers and Humans Apart, which is a pretty long acronym, but the important thing in there is that it is a form of Turing Tests. Nat maybe you can explain what that is?</div>
+<div class="line"><span class="nick nat">Nat:</span> Sure. So back in 1950 Alan Turing, the father of computing, wrote this really amazing paper called Computer Machinery and Intelligence. And what you have to understand is, in 1950 the transistor was only 3 years old. So computers were like really big, they were room sized, they were really loud and they didnÃt do very much. So it was in this world of fairly limited computer capabilities that Turing asked an enormous question, and the question was: &#8220;Can machines think?&#8221; And this is like a philosophical question, and in order to answer it you&#8217;d have to define what thinking is. </div>
+<div class="ts">00:07:13</div>
+<div class="line"><span class="nick alex">Alex:</span> But I mean it&#8217;s interesting because people are just sort of sitting around with this big old computers waiting for punch cards to be processed and they had their heads in the clouds of these sort of abstract questions.</div>
+<div class="line"><span class="nick nat">Nat:</span> Right.  Now instead of going in a total abstract route though, Turing devised, he invented a game, a very concrete game, which he called &#8220;The Imitation Game.&#8221; And the way people usually describe the game is, you have a person who&#8217;s a judge, and he&#8217;s communicating with someone else who&#8217;s in another room, who could be a computer or a human being and they&#8217;re talking through little text messages, like IM or something, and the question is: can the judge tell if he&#8217;s talking to a computer or a person?</div>
+<div class="ts">00:07:50</div>
+<div class="line"><span class="nick alex">Alex:</span> That&#8217;s sort of what&#8217;s become the Turing test, which has been around so long at this point that it actually represents sort of this like unachievable holy grail of artificial intelligence.  And it represents, if it ever gets solved it represents the point at which computers can really convincingly simulate the interactions between humans.</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah.  Actually when I was a little kid my friends and I used to talk about the Turing Test, as you said like a kind of major milestone in artificial intelligence that we figured would have been solved by now.  But I hadn&#8217;t actually read the paper until we started doing the research for the show and what I discovered is that what Turing actually wrote is different from what we just described.  See in Turing&#8217;s original paper there&#8217;s three people, there&#8217;s a man, and a woman, and the judge, and they&#8217;re all in separate rooms, and the judge is trying to guess which is the man and which is the woman. And then what you do is you take either the man or the woman and you replace them with a computer, and the question is, does that change the judge&#8217;s accuracy from when he was talking with two humans?</div>
+<div class="ts">00:08:51</div>
+<div class="line"><span class="nick alex">Alex:</span> Kind of a weird twist. And the computer actor in that specific scenario is like trying to trick the judge into thinking that the human is lying and it&#8217;s all very confusing.  I still don&#8217;t fully understand why that question is posed in such an obscure and specific way but, you know, it&#8217;s Turing so chances are good that he was thinking about something that I&#8217;m not.</div>
+<div class="line"><span class="nick nat">Nat:</span> No question about that.</div>
+<div class="line"><span class="nick alex">Alex:</span> You know, it&#8217;s interesting because I think the Turing test is only hard to pass if you suspect you&#8217;re talking to a computer.</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah.  Actually there&#8217;s a whole bunch of examples of people spending hours talking to even like really poorly implemented chat bots that don&#8217;t even put a delay in before they respond to someone&#8217;s IM or something like that.  So they respond in a tenth of a second.  And actually I found a really funny screenshot online, it turns out there&#8217;s a Russian chat bot that&#8217;s called cyber lover, and what it does is it goes into chat rooms and on IM and it poses as an attractive female and it enters IM conversations with men, and it kind of gradually convinces them through these faked human interactions to give up their personal information.  And so the screenshot is of the dashboard for this chat bot and you can see all the men that it&#8217;s talking to and it reports as it gets their full name and their address and their credit card numbers and that sort of thing.  We&#8217;ll have to put that up on the
  website.</div>
+<div class="ts">00:10:21</div>
+<div class="line"><span class="nick alex">Alex:</span> I&#8217;m just going to go call Visa real quickly.</div>
+<div class="line"><span class="nick nat">Nat:</span> Getting back to the Turing test, though, there&#8217;s a bet on the site longbets.org, one of my favorite websites, between Mitch Kapor, who&#8217;s the founder of Lotus, and Ray Kurzweil, as to whether a computer will be able to pass a Turing test by 2029.</div>
+<div class="line"><span class="nick alex">Alex:</span>  Yeah, and that&#8217;s the commonly understood concept of the Turing test, not the sort of gender guessing, gender faking one. Mitch Kapor is betting that computers won&#8217;t do it, which seems kind of negative to me, and Ray Kurzweil is betting that they will because his sort of whole singularity concept depends on it.  And it&#8217;s a real bet.  There&#8217;s $20,000 on the line. </div>
+<div class="ts">00:11:04</div>
+<div class="line"><span class="nick nat">Nat:</span> So Alex, Turing posed this big question back in 1950, and then for 46 years, the AI community worked like crazy to try to build algorithms that could imitate human capabilities and even really simple uncontrolled situations.  And they haven&#8217;t really quite got there.  Actually I have a little blast from the past for you, Alex.  Let&#8217;s listen to this.</div>
+<p><b>DR SBAITSO CLIP</b></p>
+<div class="line"><span class="nick alex">Alex:</span>  Oh man, It&#8217;s my very first shrink.</div>
+<div class="line"><span class="nick nat">Nat:</span>  I don&#8217;t know if you remember that from the sound blaster?</div>
+<div class="line"><span class="nick alex">Alex:</span>  I totally do.  It&#8217;s like one of those programs that were on the, it was one of the demos that came on the sound blaster install disc.</div>
+<div class="line"><span class="nick nat">Nat:</span>  Yeah.  And then they had one with the talking parrot.  You remember that, too?  It had a different voice?</div>
+<div class="line"><span class="nick alex">Alex:</span>  I kind of remember the talking parrot.  Can you simulate the voice for me?</div>
+<div class="ts">00:11:52</div>
+<div class="line"><span class="nick nat">Nat:</span>  I don&#8217;t think I could.  So because computers were having so much trouble at even really simple human tasks, let alone actually imitating people in a human context, the conventional wisdom about AI has been, for decades, that AI is in a rut.  But then in 1996, a researcher at the Weizmann Institute in Israel named Moni Naor, he looked at the situation and he saw an opportunity. He figured that the things that people could do that AI was still failing to do, he figured could be used by COMPUTERS to automatically tell computers and humans apart.</div>
+<div class="line"><span class="nick alex">Alex:</span> Moni&#8217;s paper is called &#8220;Verification of a human in the loop or Identification via the Turing Test.&#8221;  And he had a bunch of really cool ideas, some kind of novel concepts for the kinds of puzzles that you could pose to humans to determine if they were in fact human.</div>
+<div class="line"><span class="nick nat">Nat:</span> Most of those puzzles you&#8217;ll see are kind of in the areas of like sensory processing, image recognition, that kind of thing. Actually I think we should just read a couple.</div>
+<div class="line"><span class="nick alex">Alex:</span> Alright, yeah.  There&#8217;s one that was the Gender recognition, which is actually kind of difficult if you show a picture of a face determining whether or not it&#8217;s a male or a female.</div>
+<div class="ts">00:12:56</div>
+<div class="line"><span class="nick nat">Nat:</span> I have trouble with that just in real life.</div>
+<div class="line"><span class="nick alex">Alex:</span>  Me, too.  I got hit the other day because of it.  And there&#8217;s facial expression understanding, whether the person in the picture is happy or sad.  And then there&#8217;s identifying body parts, which actually seems like a really difficult problem to me for computers to solve, being able to tell which, in a random picture, whether or not you can highlight the arm or the leg.</div>
+<div class="line"><span class="nick nat">Nat:</span> HereÃs one I like, filling in words. Given a sentence where the subject has been deleted and a list of words, select one for the subject.</div>
+<div class="line"><span class="nick alex">Alex:</span> That&#8217;s kind of cool.  Sort of text comprehension.</div>
+<div class="line"><span class="nick nat">Nat:</span> And he also here mentions handwriting understanding, which is actually pretty close to what CAPTCHAs ended up being.</div>
+<div class="line"><span class="nick alex">Alex:</span> And he mentions also speech recognition, which is used in audio CAPTCHAs today for blind people.</div>
+<div class="line"><span class="nick nat">Nat:</span> So I mean Moni&#8217;s paper gives us a pretty good inkling of what CAPTCHA could be, but he wrote the paper before CAPTCHA was actually invented.  And a lot of these particular ideas, well they didn&#8217;t turn out to be that great.</div>
+<div class="ts">00:13:54</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah, things like drawing a circle around a person in a scene or a person&#8217;s body part is actually kind of annoying to do in practice.  And also things like guess the word that fits into the sentence you can do by, if you index a lot of web pages you can determine which sentences are common or which word structures are very common.</div>
+<div class="line"><span class="nick nat">Nat:</span> And actually whenever you have a test that doesn&#8217;t have very many choices, like for example a binary choice, like male or female, if you just write a script that guesses randomly you&#8217;re going to be right 50% of the time. So that&#8217;s a pretty good pass rate for a pretty short script.  So you have to give the user like lots of binary choices, like five or ten or something like that, to make the random guessing pass rate low enough or whatever.  But anyway, totally independently of this paper that Moni Naor&#8217;s wrote, you had the work that was going on at Altavista. So kind of industry and academia were converging on the same point.</div>
+<div class="ts">00:14:48</div>
+<div class="line"><span class="nick alex">Alex:</span> Right. And a few years later at CMU, this totally awesome guy named Luis von Ahn and his professor Manuel Blum wrote a paper where they coined the term CAPTCHA and sort of formalized the whole concept. One thing that&#8217;s totally awesome in this paper and one of the reasons I like CAPTCHAs so much is that it points out that CAPTCHAs are pretty much a win-win situation, &#8220;either the CAPTCHA is not broken and there is a way to differentiate humans from computers, or the CAPTCHA is broken and a useful AI problem has been solved.&#8221;</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah, I love that, too.  I think that&#8217;s really cool.  So since 1996, 1997, the time when AltaVista invented CAPTCHA and thee papers, CAPTCHAs have become super wide spread.  Millions are soles every day.  And, by the way, the average CAPTCHA takes about 14 seconds to solve. So if you multiply that out that&#8217;s a lot of time that&#8217;s being spent by people solving CAPTCHAs every day.  And with all this work being done, Luis von Ahn saw an opportunity, and with a couple of other people, founded a company called reCAPTCHA.</div>
+<div class="ts">00:15:48</div>
+<div class="line"><span class="nick BenMaurer">Ben Maurer:</span> So my name is Ben Maurer.  I&#8217;m one of the cofounders of reCAPTCHA and I&#8217;m responsible for the design of our API and for our infrastructure.</div>
+<div class="line"><span class="nick Ben">Ben:</span> So people are solving 200 million CAPTCHAs a day, let&#8217;s say, and what they&#8217;re doing is they&#8217;re spending time doing something that by definition a computer can&#8217;t do. That&#8217;s automatically valuable because if we could give people a task that is useful then we&#8217;re getting something that we don&#8217;t otherwise have the ability to get.  And so we said what can we do with all this, you know, with all this human computation power?</div>
+<div class="line"><span class="nick alex">Alex:</span> So just to cut in, in case you don&#8217;t know what reCAPTCHA is, you&#8217;ve probably seen these before: they&#8217;re the CAPTCHAs that have two words that you have to type, the words are usually in some kind of old or smudgy print face, and there&#8217;s maybe a line drawn through them.</div>
+<div class="line"><span class="nick Ben">Ben:</span> And what we came up with is instead of having one word in the CAPTCHA we have two words and one of them is sort of a fake.  It&#8217;s not part of the CAPTCHA it&#8217;s just a word that we don&#8217;t know what it is and we want you to tell us what it is and we do that to digitize books and newspapers and other content that computers can&#8217;t read.</div>
+<div class="ts">00:17:00</div>
+<div class="line"><span class="nick nat">Nat:</span> So then what they do is they run two different OCRs over the text. Ben told me that they use a couple of commercial OCRs, and an open source one called Tesseract, which comes from Google, which is now considered pretty state of the art. And they identify words that the OCR software couldn&#8217;t recognize or doesn&#8217;t have a lot of confidence about. Ben explained it pretty well.</div>
+<div class="line"><span class="nick Ben">Ben:</span> So OCRs are never 100% sure whether they&#8217;re right or not.  But what we do is we take multiple OCR engines that use different algorithms and they tend to have failures that aren&#8217;t 100% correlated with each other. If they both agree then we sort of say we&#8217;re, it&#8217;s very likely that the word is correct.  We use a few other signals such as, you know, does the word fit in this sentence?  Like if you have, you know, one sentence we had in an old newspaper was that the motors ears were running down the street.  And motor ears is something that just doesn&#8217;t occur in the English language and what happened is a C looked like an E to the OCR and we have the ability to say motor ears is a bigram that just doesn&#8217;t typically appear and it&#8217;s suspicious. </div>
+<div class="ts">00:18:07</div>
+<div class="line"><span class="nick nat">Nat:</span> By the way, Alex, I thought it was nifty that they also use bigram probabilities to help identify which words the OCRs failed to recognize.</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah, I suspect that they&#8217;re using the one provided by Google where they have this huge bigram index, this big database you can download for a small fee, and it basically shows the occurrence of combination of words all over the web.</div>
+<div class="line"><span class="nick nat">Nat:</span> It makes sense actually also because Google ended up buying reCAPTCHA pretty recently.</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah.  And reCAPTCHA has APIs in a whole bunch of languages. So it&#8217;s sort of a general-purpose CAPTCHA platform that you can just embed into your site. And these things are used everywhere, on Facebook, TicketMaster, Craigslist, wikipedia&#8230;everywhere..</div>
+<div class="line"><span class="nick nat">Nat:</span> Ben told, Alex, that reCAPTCHAs actually getting  a whole lot of old books and newspapers transcribed.</div>
+<div class="ts">00:18:55</div>
+<div class="line"><span class="nick Ben">Ben:</span> We&#8217;ve done about, I think about 50 years worth of the New York Times already and currently reCAPTCHA users are solving 50 million CAPTCHAs a day</div>
+<div class="line"><span class="nick nat">Nat:</span> And by the way, Alex, The Times is paying reCAPTCHA for all that digitization work that they&#8217;re doing.</div>
+<div class="line"><span class="nick alex">Alex:</span> That&#8217;s pretty awesomely shrewd right there!</div>
+<div class="line"><span class="nick nat">Nat:</span> Definitely.</div>
+<div class="line"><span class="nick alex">Alex:</span> And they&#8217;re doing all this in pretty standard stuff with Python, nginx, and of a lot of intelligent hackery.</div>
+<div class="line"><span class="nick nat">Nat:</span> Actually with all that scale, solving 50 million CAPTCHAs a day, I asked Ben a little bit about the architecture, and specifically how do they store the CAPTCHAs on disk.  Is it just on file per CAPTCHA image?   And here&#8217;s what he saidÃ?</div>
+<div class="line"><span class="nick Ben">Ben:</span> Yeah, that was originally how things worked and that&#8217;s a pretty big disaster just because every time you serve a CAPTCHA then you end up doing a disk seek.  And when you have a server that can serve a few thousand requests per second you can&#8217;t do a few thousand disk seeks per second.  It&#8217;s just too slow.</div>
+<div class="line"><span class="nick Ben">Ben:</span> And we found that one file per CAPTCHA, when we would get substantial load on the server the latency would become very high.  So we actually use a custom file format to store the CAPTCHAs that allows us to load a bunch of CAPTCHAs into memory at once.</div>
+<div class="ts">00:20:16</div>
+<div class="line"><span class="nick alex">Alex:</span> That&#8217;s great.  That&#8217;s another one of those sort of problems that you only run into when you have really large amounts of scale.</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah, and it&#8217;s cool to peek under the covers of an operation like that.. </div>
+<div class="line"><span class="nick nat">Nat:</span> Now, by the way, reCAPTCHA does just take the scanned word off the page and present it to you unmodified, they actually distort the word a little bit before you see it in the CAPTCHA.</div>
+<div class="line"><span class="nick alex">Alex:</span> Right, like I said, they maybe draw a line draw a line through it or they make it wavy. And recently they started using these like XOR blobs where they would sort of switch the foreground of the word with the background of the word for part of the word.</div>
+<div class="ts">00:20:50</div>
+<div class="line"><span class="nick nat">Nat:</span> And the reason, Ben told me, that they do this is because, even though OCR software couldn&#8217;t recognize the word, you know, OCR software is not really designed to solve CAPTCHAs, it&#8217;s trying to get a balanced view of the document, so it might be possible to build an algorithm that could get enough CAPTCHAs right to be annoying. For example, Ben said that if you took a standard OCR software software and just tweaked it&#8217;s algorithm to use its second best guess for what the word could be instead of its best guess that might solve enough reCAPTCHAs to be a problem. So that&#8217;s why they add extra distortion just for extra safety.</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah, and everyone we&#8217;ve talked to has basically said the same thing, which is that reCAPTCHA is one of the toughest CAPTCHAs out there, which is important because you only need, say, 10% of CAPTCHAs solved by your bot to create thousands of fake Gmail accounts or get a lot of SPAM comments through. So the team at reCAPTCHA works really hard to make their CAPTCHAs as difficult to break as possible, while still trying to keep them easy for humans to solve.</div>
+<div class="line"><span class="nick nat">Nat:</span> And they&#8217;ve had a pretty good balance but CAPTCHA was not always as secure as it is now.  And, Alex, there&#8217;s a funny story about that.</div>
+<div class="music"></div>
+<div class="ts">00:22:04</div>
+<div class="line"><span class="nick nat">Nat:</span> So back in the fall of 2004, Microsoft&#8217;s hotmail team, like most webmail services, one of their big concerns was SPAM, and specifically SPAMMERS using hotmail to send SPAM.</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah, and Nat, like every other webmail service on the planet, with the sort of first line of defense is to ask people that are creating a new account to solve a CAPTCHA.</div>
+<div class="line"><span class="nick nat">Nat:</span> So hotmail was depending on CAPTCHAs to protect them from SPAM.  And they wanted to know: how safe are these things anyway? You know, hard would it really be to build an algorithm to break a CAPTCHA? So being Microsoft, of course, they have a really substantial research division right on campus.  So they called up Microsoft Research and got in touch with a scientist in the division there named Kumar Chellapilla, who is a machine learning expert.</div>
+<div class="ts">00:22:52</div>
+<div class="line"><span class="nick Kumar">Kumar:</span> Yeah, so my actual relationship with CAPTCHA comes from machine learning.  So my actual PhD research work was on computational intelligence, and this is trying to build intelligent adversaries or agents that could act and train against humans.</div>
+<div class="line"><span class="nick Kumar">Kumar:</span> So these are models that you could train by giving it like input and output signal.  And for some, for my PhD work I did mostly game playing like checkers and chess and so on.</div>
+<div class="line"><span class="nick nat">Nat:</span> When Kumar joined Microsoft Research, he did some work on OCR technology and handwriting recognition specifically for their tablet PC project. </div>
+<div class="line"><span class="nick Kumar">Kumar:</span> One of the common areas is signature analysis.  How do you get a computer to look at two signatures and tell it to accept the signature or not?   These are very, very hard problems.</div>
+<div class="line"><span class="nick nat">Nat:</span> And so Kumar sat down and he looked at the most prominent CAPTCHAs on the web from the biggest companies on the web at the time, and here&#8217;s what he found.</div>
+<div class="line"><span class="nick Kumar">Kumar:</span> And I was surprised.  I have somewhat of an undergrad understanding of image processing, a doctorate level understanding in machine learning, and as I started applying some of these techniques, it was very easy to undo the challenges that were being put forth by the CAPTCHA.  And I was so surprised at how quickly this happened, that we immediately, I think in November 2004, December 2004 there this famous machine learning conference called neural information processing systems then that was the first place where we presented a poster.</div>
+<div class="ts">00:24:17</div>
+<div class="line"><span class="nick Kumar">Kumar:</span>  And it was amazing.  We had about half a dozen different CAPTCHAs that were provided by several different people in the industry and we could show that many of them you could break like one out of two, one out of four, two out of three.</div>
+<div class="line"><span class="nick nat">Nat:</span> Now, Alex, as it turns out, solving a CAPTCHA is something that actually breaks down into two separate problems: first is the problem of segmentation, and then comes the problem of recognition.</div>
+<div class="line"><span class="nick alex">Alex:</span> And I didn&#8217;t know this beforehand but segmentation is the process of breaking a picture of a word up into individual letters. And the recognition is then taking each one of those sort of subpictures and identifying which letter it represents.</div>
+<div class="line"><span class="nick nat">Nat:</span> And what Kumar quickly discovered was that recognizing the letters in most of the CAPTCHAs at the time was pretty easy.</div>
+<p>[00:025:01]</p>
+<div class="line"><span class="nick Kumar">Kumar:</span> one of the problems we already solved by the time I started looking at CAPTCHAs was if you give me a single character, moderately distorted but not devastatingly distorted, then you sort of use your mouse or you point to the center of the character, I have techniques that can learn from that signal and basically give you the character that is there at that point.</div>
+<div class="line"><span class="nick nat">Nat:</span> The tool that Kumar was using was a special kind of neural network called a convolutional neural network.</div>
+<p>Actually, why don&#8217;t we start off and tell people what neural networks are.</p>
+<div class="line"><span class="nick alex">Alex:</span>  Yeah, sure.  Neural networks are this sort of pretty widely-used technique in AI that&#8217;s been around for a really long time. And the basic idea is that you have these neuron-like elements that have inputs and outputs and the inputs and outputs are sort of arranged with inputs going into other neurons and outputs going into other neurons. So for a given neuron each input has its own weight, which multiplies the input value. The neuron adds up those weighted inputs, and if it&#8217;s greater than a certain threshold then the neuron fires, meaning that it sends a signal to its output.  And the output signals of all these neurons sort of propagate through the network until you get the &#8220;answer&#8221; on a specific set of output neurons. </div>
+<div class="ts">00:26:15</div>
+<div class="line"><span class="nick nat">Nat:</span> Exactly.  So the basic idea for convolutional neural networks came from an experiment that was done back in 1959 by these two guys, David Hubel and Torsten Wiesel. What they did was they took a cat, and they put it under anesthesia. And then they inserted some electrodes directly into the cat&#8217;s visual cortex. And they opened its eyes and flashed different patterns of light and dark lines in front of the cat. And what they found was really interesting, they found that some neurons in the cats visual cortex fired rapidly in response to lines at one angle, and some neurons fired rapidly in response to lines at a different angle. So there was some angle sensitivity to different groups of neurons.  And there were other neurons in the visual cortex that were totally angle-independent.</div>
+<div class="ts">00:26:57</div>
+<div class="line"><span class="nick nat">Nat:</span> So what happened subsequent to that is, you know, this was obviously a pretty big result in neurology but some computer scientists got a hold of it and what they realized is they could take neural nets and they could arrange them like a cats visual cortex was arranged.  So the lowest level you&#8217;d have neurons which are recognizing simple features in the image, like corners, edges at a certain angle or end points in certain regions of the image. And then there would be subsequent layers, which are usually called the hidden layers, in the neural network, and these subsequent layers would sort of combine those basic features to detect higher-order characteristics or features in the image. And if you have enough of those and the right kind you can start to recognize even really distorted letters or objects or things like that.  So it turns out that this special type of network, this convolutional neural network, which is 
 sort of roughly based on the way vision works in mammals, is really good at image recognition.</div>
+<div class="ts">00:27:53</div>
+<div class="line"><span class="nick nat">Nat:</span> Specifically, these networks were really good at recognizing handwriting.  And so when Kumar got assigned the whole CAPTCHA project he&#8217;d already been through Tablet PC and he had all this handwriting recognition experience and therefore he had this really powerful image recognition tool at his disposal.  And when he took this thing and he pointed it at the state of the art CAPTCHAs on the web it just blew them away.</div>
+<div class="line"><span class="nick Kumar">Kumar:</span> So four out of five characters I&#8217;d be able to recognize correctly or certainly nine out of ten.  So that was sort of like, that was the platform for almost all of my techniques.  I would try to reduce every CAPTCHA I saw out there with some ad hoc processing down to a place where I could just give it maybe like five or ten locations where I thought characters were and then this system would, it&#8217;s not free because you have to label like thousands and thousands of these laboriously but it&#8217;s a very automatable technique.</div>
+<div class="line"><span class="nick nat">Nat:</span> So, Alex, with the recognition problem solved, for Kumar breaking CAPTCHAs basically came down to just identify the locations of the letters. And this is the segmentation problem. And in the CAPTCHAs that existed on the web in 2004, segmentation was actually not that hard.  Kumar explained to me how he solved from Ticket Masters CAPTCHA.</div>
+<div class="ts">00:29:01</div>
+<div class="line"><span class="nick Kumar">Kumar:</span> They were exclusively using these grids of slanted lines.  They were almost regular but not exactly.  They would tilt a little bit between the different parallel lines in the grid, but their text was always horizontal so, and the text was always thicker than the grid.  So if you did some blurring the background lines would blend into the background and then the words will stay up in the front. </div>
+<div class="line"><span class="nick Kumar">Kumar:</span> So if you had the word hello on a white piece of paper, hello being typed in black, and then you could get a stat of that page to this connected components algorithm, and what it will do is it will take, it will start at one of the black pixels, let&#8217;s say the H, it would grow that by looking at the neighboring black pixels and it will slowly grow it into the letter H.  Then once it has reached the edge of H it will no longer blend into the background so it will remove that letter H as a character.  And you can repeat this iteratively until you get all the characters.  So that&#8217;s another one where I think the Ticket Master one was reduced down to one of those and then we could build that out.  Register.com also had a similar one.  The very early MSN Hotmail one also did not have enough arcs, so some of the characters would not even be touching so you could easily eliminate those.</div>
+<div class="ts">00:30:23</div>
+<div class="line"><span class="nick nat">Nat:</span> So what do you do when the characters are touching?</div>
+<div class="line"><span class="nick Kumar">Kumar:</span> So if this island analogy works, you can think of everything in the background is a big ocean because it&#8217;s relatively flat and there&#8217;s these islands sticking out.  You could grow the islands, let&#8217;s say there&#8217;s more sedimentation and the land mass kinda moves out into the water, then if two islands are very close to each other then they may grow and they may connect each other.  So that allows you to sort of connect things.  And that&#8217;s usually like a growing operation standard image processing, things like halos and so on you can add to objects that way.</div>
+<div class="ts">00:30:56</div>
+<div class="line"><span class="nick Kumar">Kumar:</span>  You can also do erosion, which is the opposite.  You remove pixels that are very close to the edge of the character, so in the same island sense, you&#8217;re losing, because part of the island is eroding into the ocean and so that way you can separate two characters that are connected.  So if you have two, let&#8217;s say you have two O&#8217;s that are connected by a thin line, unless for a simple suggestion the O&#8217;s are more like filled in circles, then as you erode the circles are relatively in all directions, so they may become smaller circles still filled in, but the line that&#8217;s connecting the two circles would slowly get to a point where once it becomes really thin, one pixel wide, another step of erosion would just completely cause the connected pixels to go away.  And now you&#8217;ve broken two O&#8217;s connected by a line into two O&#8217;s.  And once they&#8217;re separated you can then do the 
 opposite.  You can now start to grow them back.  And so if you do something like the four steps of erosion followed by four steps of growing you would lose every line or anything that was thinner than four pixels wide.</div>
+<div class="line"><span class="nick Kumar">Kumar:</span>  And so that&#8217;s like a common, you erode to just make them disconnected, then you grow them back so that pieces of character that provide the erosion connect back.</div>
+<div class="ts">00:32:13</div>
+<div class="line"><span class="nick alex">Alex:</span> That&#8217;s a totally awesome description, but I suspect in practice it&#8217;s a lot more nuanced and probably required a PhD to understand what the hell&#8217;s going on.</div>
+<div class="line"><span class="nick nat">Nat:</span> Well actually the paper&#8217;s pretty well written.  It&#8217;s pretty accessible.  We&#8217;ll put a link on our website if you want to check it out.  But what he basically said in the paper was: recognizing distorted characters is solved. If you want to make CAPTCHAs really hard, lean on the segmentation problem because identifying the locations of the characters is really surprisingly hard if you do things like make them touch or don&#8217;t just do totally trivial things to your CAPTCHA. So the best CAPTCHAs on the web today have adapted to pose really harder segmentation problems.</div>
+<div class="line"><span class="nick alex">Alex:</span> It seems so weird to me that image recognition can, you know, identify a letter, it just can&#8217;t figure out where it is.</div>
+<div class="ts">00:33:00</div>
+<div class="line"><span class="nick nat">Nat:</span> I know, right? It&#8217;s not intuitive at all.  Now actually, even though a lot of these issues were pointed out five years ago in Kumar&#8217;s paper, and Google and Microsoft and Yahoo and reCAPTCHA now have really good CAPTCHAs that are hard for computers to break, a lot of the CAPTCHAs that you find on the web and in the wild, and you and I have both run into these, they still mostly pose a recognition problem and not a segmentation problem. Actually, I asked Dr. Broder about this, and here&#8217;s what he said.</div>
+<div class="line"><span class="nick AndreiBroder">Broder:</span> You know, I see some CAPTCHAs that clearly are very hard for humans to solve but in fact they don&#8217;t introduce any difficulty for computers whatsoever.  They are simply creating some extra annoyance for humans without getting any quality.  I mean people have to realize what are the hard problems and what are not the hard problems.  And some of the CAPTCHAs are totally silly and I&#8217;m sure that you can use them as an exercise in any course in pattern recognition and people will solve them.</div>
+<div class="ts">00:34:08</div>
+<div class="music"></div>
+<div class="ts">00:34:19</div>
+<div class="line"><span class="nick alex">Alex:</span> If you look around the web today you can find like little Python scripts or other little programs that you can run to break some of the weaker CAPTCHAs out there.</div>
+<div class="line"><span class="nick nat">Nat:</span> And, Alex, actually I have a little treat for us.  I did some googling and I found a university student in Northern England who wrote a particularly cool CAPTCHA solver.</div>
+<div class="line"><span class="nick Shaun">Shaun:</span> Well, my name is Shaun Friedle and I&#8217;m the author of Megaupload auto-fill CAPTCHA, which is a GreaseMonkey script for Firefox which auto completes the CAPTCHA on megaupload.</div>
+<div class="line"><span class="nick alex">Alex:</span> Woah! That&#8217;s such a great hack, right?  Like this guy decided to start solving CAPTCHAs in the browser using Javascript.</div>
+<div class="ts">00:34:55</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah, I mean this is the definition of like a good hack basically.  I mean what Shaun Friedle did is he wrote a GreaseMonkey script that solves just this one particular CAPTCHA, on a site called Megaupload, which I&#8217;ve never heard of before but apparently is like one of those websites like rapidshare where you can upload a file and people can download it. And they use CAPTCHA to protect the download link from bots. And I asked Shaun how he got into this, what motivated him to do this in the first place.</div>
+<div class="line"><span class="nick Shaun">Shaun:</span> And then I came across a farm thread on the user scripts to [our] site.  Someone else was asking if it was possible to decode the reCAPTCHA script in using a GreaseMonkey script and all of these people were saying no that&#8217;s stupid, there&#8217;s no way you will be able to do it, that&#8217;s impossible.</div>
+<div class="line"><span class="nick nat">Nat:</span> And so you took that as a challenge, huh?</div>
+<div class="line"><span class="nick Shaun">Shaun:</span> Yeah, I thought well I don&#8217;t know if it&#8217;s really possible in the reCAPTCHA, I thought well I can probably try and do that just in GreaseMonkey purely in JavaScript on the megaupload CAPTCHA.  And at that time I had done no image processing in JavaScript.  In fact, I ran about 100 types in JavaScript before that point so I&#8217;m not really a JavaScript programmer.  So I started researching whether it was possible and I found out using the canvas functionality in HTML 5 you could do some image processing and eventually built it from there and managed implementing the entire thing in JavaScript.</div>
+<div class="ts">00:36:19</div>
+<div class="line"><span class="nick alex">Alex:</span>  I actually hadn&#8217;t heard of anybody doing sort of external image processing using Javascript and CANVAS like that.</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah, actually Sean was really humble and he said he&#8217;d never done any image processing in Javascript before.  But I think like almost no one had ever done image processing in Javascript before he wrote this hack.  And then it ended up on John Resnick&#8217;s blog, who is the author of [J Quarry] and a lot of people found it pretty interesting.  That&#8217;s actually how I found out about it.  But it does seem like a technique that could be useful lots of different places. Anyway, Shaun had also previously read this game programming book, and learned about neural networks from that, and so he implemented a neural network in Javascript, and then he manually trained it by typing in a whole bunch of CAPTCHAs himself to recognize the megaupload CAPTCHA.</div>
+<div class="ts">00:37:02</div>
+<div class="line"><span class="nick alex">Alex:</span> And his script still works?</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah.  He told me it can solve the Megaupload CAPTCHA in about 200 milliseconds.</div>
+<div class="line"><span class="nick alex">Alex:</span> That&#8217;s not bad.  I think it&#8217;s a neat hack because it&#8217;s not necessarily anything novel research wise but doing it all in Javascript inside the browser and being a novice.  It just seems like really educational.</div>
+<div class="line"><span class="nick nat">Nat:</span> We asked everyone who has been a CAPTCHA like what they think when they run into CAPTCHAs on the web and he said the thing he thinks is that about 60% of the CAPTCHAs he encounters he could probably hack with his GreaseMonkey script with a few hours of modifications. And of course he&#8217;s talking about the CAPTCHAs, not the big company CAPTCHAs but the ones that don&#8217;t pose really hard segmentation problems.</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah, and that totally goes against my original thinking when we started this podcast, which is that the way you can be sure that a CAPTCHA works is by writing your own because you&#8217;ll be able to sort of hide anonymously on the internet because people won&#8217;t spend time solving your particular CAPTCHA.  But it turns out that, you know, unless you&#8217;re kind of tracking the leading edge in image recognition technology, like reCAPTCHA is, you&#8217;re CAPTCHAs are probably going to end up really, really trivial to solve.</div>
+<div class="ts">00:38:12</div>
+<div class="line"><span class="nick nat">Nat:</span> And you&#8217;re kind of right on one count though, Alex, which is that if you&#8217;re site is really tiny and nobody cares about it they&#8217;re not going to try to bother to break your CAPCHA anyway.  But the cool thing with reCAPTCHA of course is that they&#8217;re going to always keep up with the latest attacks. It&#8217;s like a platform that&#8217;s always going to evolve with the attackers.</div>
+<p>Now we&#8217;ve been talking about some pretty sophisticated ways of attacking CAPTCHA. But there&#8217;s one very easy way to break a CAPTCHA we haven&#8217;t mentioned yet.</p>
+<div class="line"><span class="nick alex">Alex:</span> Is this the sort of legendary porn attack that I&#8217;ve always heard about?</div>
+<div class="line"><span class="nick nat">Nat:</span> Well that&#8217;s one.  Why don&#8217;t we talk about it first?</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah, so I always heard that, like there&#8217;s always been this rumor that porn sites would stick CAPTCHAs up in front of people who wanted to look at porn images and they would have to solve the CAPTCHA in order to move on and see the pornographic image.  And that CAPTCHA they solved would then be forwarded along to some script that was creating an email account or posting a comment.</div>
+<div class="ts">00:39:07</div>
+<div class="line"><span class="nick nat">Nat:</span> This is like exactly the kind of story that&#8217;s designed to just be spread all over the internet because it involves like a cool hack and pornography.  But it turns out it&#8217;s not really an issue. The volume of CAPTCHAs that would be solved by this technique is just too low to actually make a dent. And it&#8217;s not really a very competitive thing for a port site to do: for every one site that issues CAPTCHAs in front of their images, there are a thousand that won&#8217;t.  So it doesn&#8217;t add up economically. There is another way that humans can be used to break CAPTCHAs that is actually is a bit more of an issue.</div>
+<div class="line"><span class="nick alex">Alex:</span> Oh is the sort of like CAPTCHA Farm things in India with lots of people solving CAPTCHAs.</div>
+<div class="line"><span class="nick nat">Nat:</span> They actually prefer the term &#8220;CAPTCHA bypass service.&#8221;</div>
+<div class="line"><span class="nick alex">Alex:</span> I&#8217;ve heard of these, too.  These are like teams of very low-wage people usually in poor countries just typing in CAPTCHAs for very, very small amounts of money all day long.  And I guess these guys  break CAPTCHAs and then they get forwarded along to create SPAM and blog comments and things like that.</div>
+<div class="ts">00:40:06</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah.  And actually we heard a funny story about this from a friend at Google. Apparently Google has this property Blogger, and apparently they were having problems with people creating SPAM blogs on Blogger.  So they added a CAPTCHA to the blog creation page and that helped for awhile.  But then eventually the spam blogs came back. And they tracked the CAPTCHA solutions to this one IP address in Costa Rica. Instead of just blocking the server they decided to monitor it and they could see the rate at which the CAPTCHAs were being solved, it&#8217;s actually changing over the course of the day. At 9am they&#8217;d be solving something like say 10 CAPTCHAs per minute, and then half an hour later, at 9:30, they&#8217;d be solving like 20 CAPTCHAs per minute, and at 9:45 they&#8217;d be solving 30. And then it would maybe continue like that until 12 o&#8217;clock and drop to zero for an hour. And then at 1:00 it would pick bac
 k up again.  So they could deduce from this there&#8217;s a team of four people, drifting into work in the morning and then all going to lunch together in the afternoon solving CAPTCHAs for a living.</div>
+<div class="ts">00:41:03</div>
+<div class="line"><span class="nick alex">Alex:</span> The funny thing is that when these guys solved CAPTCHAs and then those CAPTCHAs are used to post SPAM on web pages, you know, they&#8217;re not actually expecting people to click on the links that are included in those SPAM comments, they&#8217;re usually just there to just trick Googles PageRank algorithm into rating the spammy links higher.</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah, actually that&#8217;s a really good point and PageRank is big money.  So most of these CAPTCHA farms area actually a lot bigger than just four guys in Costa Rica somewhere.  We tried really hard to try to interview a CAPTCHA farmer for this podcast. None of the ones we contacted would agree to have their voice recorded for some reason, but they did answer some questions over email, and we&#8217;ll link some of their web pages online where they advertise their services.</div>
+<div class="ts">00:41:51</div>
+<div class="line"><span class="nick nat">Nat:</span> And you can see, for example, that the prices are just, I mean they are astoundingly cheap.  To solve 1000 CAPTCHAs, for example, one site called decaptcher.com charges for just $2. So even if the workers take the average of 14 seconds to solve each CAPTCHA, and they don&#8217;t have any time between CAPTCHA solutions, that comes out to like fifty cents per hour. And actually by email, we learned that many of these CAPTCHA farmers are further kind of hindered by the fact that they&#8217;re not great typists and they don&#8217;t speak any English at all.</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah, that&#8217;s a pretty sucky situation.  You can imagine how hard it would be to solve CAPTCHAs in Hindi.  These guys are probably not even solving at the optimal or like the average 14 seconds for each one.</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah.  I mean if we had to solve CAPTCHAs in Hindi, good Lord.  The service, decaptcher, they provide APIs in a whole bunch of languages, you know, C, C++, Perl, Python, C#, etcetera, and they even a FAQ question on their website.  I&#8217;ll read it for you.  Here&#8217;s the question: <i>I want to bypass CAPTCHAs from my bot. The bots all have different IPs. Is it possible to use your service from many IPs?</i>  Then they answer:  <i>we have no restrictions about IP: with DeCaptcher you can bypass CAPTCHA from as many IPs as you need.</i></div>
+<div class="ts">00:43:07</div>
+<div class="line"><span class="nick alex">Alex:</span> Wow. So they&#8217;re just like right out in the opening about using botnets to solve CAPTCHAs huh?</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah, seriously.  What it comes down to, really, with these CAPTCHA farms, you know, you can&#8217;t stop them.  They&#8217;re going to be out there, they&#8217;re going to exist.  It is a solution that somebody can just type in a CAPTCHA.  It&#8217;s not totally secure.  So CAPTCHAs not about total security, CAPTCHA is really just about making spam uneconomical. If we go back to Broder at Yahoo! one more time, I think he put this really well:</div>
+<div class="ts">00:43:34</div>
+<div class="line"><span class="nick AndreiBroder">Broder:</span> Yeah, this is exactly right.  I mean it&#8217;s exactly the same problem you have in mail spam and there are actually nowadays good statistics about how many people are actually answering those ads for changing your anatomy and so on.  And it&#8217;s an incredibly small number, 1 in a million or 1 in 10 million or something like that.  And you can essentially compute a certain ROI so if you increase the cost even slightly suddenly the whole enterprise becomes non profitable.  And I think that&#8217;s basically what we are trying to increase the cost slightly but because you have to multiply it by a large number of attempts to make it nonprofitable.</div>
+<div class="music"></div>
+<div class="ts">00:44:32</div>
+<div class="line"><span class="nick nat">Nat:</span> So we&#8217;ve kind of moved from this big philosophical question, Can Machines Think, to rooms full of poor people typing in squiggly letters to help sell Viagra on the internet.</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah, but along the way in talking about computers solving and posing these questions that really represent the bleeding edge of artificial intelligence. And it&#8217;s funny to think like, you know, could Turing have imagined that this would be the battleground for his, you know, his sort of ultimate question of can computers think..</div>
+<div class="ts">00:45:03</div>
+<div class="line"><span class="nick nat">Nat:</span>  So I think one thing people want to know is where&#8217;s all this all going? I mean what is the future of CAPTCHA?  Let&#8217;s talk a little bit about that.</div>
+<div class="line"><span class="nick alex">Alex:</span> Well, it&#8217;s still an active field so there&#8217;s new CAPTCHAs being invented all the time. I did one for Microsoft that just came out, it&#8217;s called ASIRRA, and this works by showing you a picture of an animal and asks you if it&#8217;s a dog or a cat. Now, like we said before, that&#8217;s a binary choice so they end up showing you a few of them to see that, you know, so that the odds of just guessing randomly aren&#8217;t quite so good. And there&#8217;s this other one from Google called rotCAPTCHA, which asks you to tell which pictures are facing right-way-up, which I guess is also difficult for computers.</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah, and actually computer vision, of course, is advancing, too.  Actually, Kumar, from Microsoft, told me that the whole 1d segmentation problem &#8211; a bunch of letters are in a slightly wavy line &#8211; that&#8217;s getting solved, too.  So the future in CAPTCHA letters might need to be scattered around in 2d in a plane. But eventually, of course, the machines are going to be able to do that, too. So the question that I kind of wanted to know since we started looking into this whole topic is &#8211; when&#8217;s that going to happen?  When will CAPTCHA no longer be viable as a concept?  Here&#8217;s Kumar Chellapilla once again.</div>
+<div class="ts">00:46:16</div>
+<div class="line"><span class="nick Kumar">Kumar:</span> It&#8217;s an adversarial problem.  So if you&#8217;re blocking spammers they&#8217;re going to work harder and they&#8217;re going to automate existing solutions to make them cheaper, like more adversarial problems.  There is one advantage, though, I should call it out.  It&#8217;s a lot easier to generate I mean lots and lots of more difficult CAPTCHAs than it is to break them.  So even in vision or in machine learning, computer vision machine learning, people talk about this synthesis forces analysis dichotomy, right?  Is it more difficult to ask difficult questions or it more difficult to answer difficult questions?  And there are a lot of these nonsymmetrical problems where for the email or any freebie research entities, they can easily generate lots and lots and lots of difficult CAPTCHAs.</div>
+<div class="ts">00:47:08</div>
+<div class="line"><span class="nick nat">Nat:</span> So, Alex, before we started researching CAPTCHA for the show, I was pretty convinced that we were going to find out that AI was on just this collision path to make CAPTCHA irrelevant within, I don&#8217;t know, five to ten years. But it really doesn&#8217;t look that way to me anymore. I mean basically I think CAPTCHA is probably viable for a couple of decades or maybe longer.</div>
+<div class="line"><span class="nick alex">Alex:</span> Yeah.  And the other thing is that CAPTCHA is not even really designed to be 100% secure. So as these things slowly become more and more solvable it doesn&#8217;t necessarily mean that the whole system will fall apart.  It just means that there&#8217;ll be  a little bit more SPAM an that will push the edge of research a little bit further.</div>
+<div class="ts">00:47:49</div>
+<div class="line"><span class="nick nat">Nat:</span> Yeah, Kumar actually compared CAPTCHAs to a speed bump. So it&#8217;s sort of like a little deterrent, you combine it with other techniques like content filtering and that&#8217;s how you get a really good result.  And, you know, even if the computers do catch up we go back to that whole win-win concept.  I mean that&#8217;s a win, too.  I think Ben Maurer from reCAPTCHA said it really well..</div>
+<div class="line"><span class="nick Ben">Ben:</span> I mean, if we get to the point where computers are able to do anything that a human can do I&#8217;ll be happy. I mean at that point computers will be able to do a really good job at filtering SPAM on their own and they won&#8217;t need CAPTCHAs.</div>
+<div class="music"></div>
+<div class="ts">00:48:30</div>
+<div class="line"><span class="nick nat">Nat:</span> Well, that was our show. We had a lot of fun studying CAPTCHAs and we hope you enjoyed it, too.  We&#8217;ve posted a whole bunch of interesting links from our research on hackermedley.org so that you can learn more about the Turing Test and neural networks and cat brains, and that kind of thing. So check it out.</div>
+<div class="line"><span class="nick alex">Alex:</span> This episode was a bit of an experiment in doing a longer form show, with interviews no less.  So we&#8217;d love to hear if you think it worked and especially if you think it didn&#8217;t work.  So please visit our website at hackermedley.org and give us some feedback.</div>
+<div class="line"><span class="nick nat">Nat:</span> Thanks for listening.</div>
+<div class="line"><span class="nick alex">Alex:</span>  Yeah, thanks.</div>
+<div class="music"></div>
+</div>
+</div>
+]]></content:encoded>
+			<wfw:commentRss>http://hackermedley.org/archives/96/feed</wfw:commentRss>
+		<slash:comments>22</slash:comments>
+
+		<media:content url="http://feedproxy.google.com/~r/HackerMedley/~5/p6w--o9l1-k/medley4.mp3"; fileSize="59911070" type="audio/mpeg" /><itunes:explicit>no</itunes:explicit><itunes:subtitle> For our fourth episode, we decided to try making a long, in-depth show about those squiggly word puzzles you find all over the internet, called CAPTCHAs. This is our first show that contains interviews, including of the happy fellow you see above, Dr. An</itunes:subtitle><itunes:author>Nat Friedman and Alex Graveley</itunes:author><itunes:summary> For our fourth episode, we decided to try making a long, in-depth show about those squiggly word puzzles you find all over the internet, called CAPTCHAs. This is our first show that contains interviews, including of the happy fellow you see above, Dr. Andrei Broder, the Chief Scientist at Yahoo!. You&amp;#8217;ll hear from him quite [...]</itunes:summary><itunes:keywords>hacker,software,programmer,code,linux,programming,open,source,geek,geeky</it
 unes:keywords><feedburner:origLink>http://hackermedley.org/archives/96</feedburner:origLink><enclosure url="http://feedproxy.google.com/~r/HackerMedley/~5/p6w--o9l1-k/medley4.mp3"; length="59911070" type="audio/mpeg" /><feedburner:origEnclosureLink>http://hackermedley.org/podcasts/medley4.mp3</feedburner:origEnclosureLink></item>
+		<item>
+		<title>Episode 3: tornado, node.js and websockets</title>
+		<link>http://feedproxy.google.com/~r/HackerMedley/~3/-myKuJMWZTg/86</link>
+		<comments>http://hackermedley.org/archives/86#comments</comments>
+		<pubDate>Sat, 20 Feb 2010 06:14:42 +0000</pubDate>
+		<dc:creator>nat nat org (Nat Friedman and Alex Graveley)</dc:creator>
+				<category><![CDATA[Uncategorized]]></category>
+
+		<guid isPermaLink="false">http://hackermedley.org/?p=86</guid>
+		<description><![CDATA[
+A quick overview of a few interesting new web technologies: tornado, node.js and WebSockets. Listen and enjoy!
+As always, we&#8217;d love to hear your thoughts and dreams and deepest desires.
+If you want to learn more, check out these links:
+
+Tornado homepage
+node.js web page.
+Using Django on top of a Tornado web server.
+Ryan Dahl&#8217;sÂ livejournal page (the webcomic seems to [...]]]></description>
+			<content:encoded><![CDATA[<p style="text-align: center;"><img class="aligncenter" title="node.js logo" src="http://hackermedley.org/images/nodejs.png"; alt="" width="527" height="270" /></p>
+<p>A quick overview of a few interesting new web technologies: tornado, node.js and WebSockets. Listen and enjoy!</p>
+
+<p>As always, we&#8217;d love to hear your thoughts and dreams and deepest desires.</p>
+<p>If you want to learn more, check out these links:</p>
+<ul>
+<li><a href="http://www.tornadoweb.org/";>Tornado homepage</a></li>
+<li><a href="http://nodejs.org";>node.js</a> web page.</li>
+<li><a href="http://lincolnloop.com/blog/2009/sep/15/using-django-inside-tornado-web-server/";>Using Django on top of a Tornado web server</a>.</li>
+<li>Ryan Dahl&#8217;sÂ <a href="http://four.livejournal.com/";>livejournal page</a> (the webcomic seems to be gone though!)</li>
+<li><a href="http://www.kegel.com/c10k.html ">C10K problem </a>&#8211; why webservers need to handle 10,000 concurrentÂ clients and move to event-based IO.</li>
+<li>WebSocketsÂ <a href="http://dev.w3.org/html5/websockets/";>JavaScript API</a>,Â <a href="http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-75";>wire protocol</a>,Â <a href="http://devthought.com/blog/2009/12/nodejs-and-the-websocket-protocol/";>in node.js</a>,Â <a href="http://bret.appspot.com/entry/web-sockets-in-tornado";>in Tornado</a>.</li>
+<li>A nice explanation ofÂ <a href="http://www.webdevelopmentbits.com/avoiding-long-polling";>long polling</a> and other workarounds</li>
+</ul>
+]]></content:encoded>
+			<wfw:commentRss>http://hackermedley.org/archives/86/feed</wfw:commentRss>
+		<slash:comments>11</slash:comments>
+
+		<media:content url="http://feedproxy.google.com/~r/HackerMedley/~5/YrBjbH4XaeM/medley3.mp3"; fileSize="14887025" type="audio/mpeg" /><itunes:explicit>no</itunes:explicit><itunes:subtitle> A quick overview of a few interesting new web technologies: tornado, node.js and WebSockets. Listen and enjoy! As always, we&amp;#8217;d love to hear your thoughts and dreams and deepest desires. If you want to learn more, check out these links: Tornado home</itunes:subtitle><itunes:author>Nat Friedman and Alex Graveley</itunes:author><itunes:summary> A quick overview of a few interesting new web technologies: tornado, node.js and WebSockets. Listen and enjoy! As always, we&amp;#8217;d love to hear your thoughts and dreams and deepest desires. If you want to learn more, check out these links: Tornado homepage node.js web page. Using Django on top of a Tornado web server. Ryan Dahl&amp;#8217;sÂ livejournal page (the webcomic seems to [...]</itunes:summary><itunes:keywords>hacker,software,pr
 ogrammer,code,linux,programming,open,source,geek,geeky</itunes:keywords><feedburner:origLink>http://hackermedley.org/archives/86</feedburner:origLink><enclosure url="http://feedproxy.google.com/~r/HackerMedley/~5/YrBjbH4XaeM/medley3.mp3"; length="14887025" type="audio/mpeg" /><feedburner:origEnclosureLink>http://hackermedley.org/podcasts/medley3.mp3</feedburner:origEnclosureLink></item>
+		<item>
+		<title>Episode 2: A brief introduction to NoSQL databases</title>
+		<link>http://feedproxy.google.com/~r/HackerMedley/~3/00JNcjwOMYo/51</link>
+		<comments>http://hackermedley.org/archives/51#comments</comments>
+		<pubDate>Thu, 21 Jan 2010 01:29:30 +0000</pubDate>
+		<dc:creator>nat nat org (Nat Friedman and Alex Graveley)</dc:creator>
+				<category><![CDATA[Uncategorized]]></category>
+
+		<guid isPermaLink="false">http://hackermedley.org/?p=51</guid>
+		<description><![CDATA[
+In our second episode (12 minutes long), Alex and Nat talk about the new generation of &#8220;NoSQL&#8221; databases that have created a lot of interest among web developers; especially those lucky people dealing with thousands of simultaneous users and terabytes of data.
+Please feel free to leave a comment below after you&#8217;ve listened to the episode. [...]]]></description>
+			<content:encoded><![CDATA[<p><img class="aligncenter" title="Google datacenter in Oregon" src="http://hackermedley.org/images/googledatacenter.jpg"; alt="" width="500" height="375" /></p>
+<p>In our second episode (12 minutes long), Alex and Nat talk about the new generation of &#8220;NoSQL&#8221; databases that have created a lot of interest among web developers; especially those lucky people dealing with thousands of simultaneous users and terabytes of data.</p>
+
+<p>Please feel free to leave a comment below after you&#8217;ve listened to the episode. We&#8217;re still total newbies at this podcasting thing, so your feedback and encouragement are a big help!</p>
+<p>If you want to learn more about NoSQL than what we covered in the show, check out these links:</p>
+<ul>
+<li><a href="http://horicky.blogspot.com/2009/11/nosql-patterns.html";>Nice introduction to all the basic concepts</a>: consistency models, replication, vector clocks.</li>
+<li><a href="http://www.vineetgupta.com/2010/01/nosql-databases-part-1-landscape.html";>A comparison of NoSQL alternatives</a> and a good braindump of the subject matter.</li>
+<li><a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html ">Amazon Dynamo paper</a>.Â Great readable paper introducing the core concepts for massively scalable datastores.</li>
+<li><a href="http://labs.google.com/papers/bigtable.html";>BigTable paper</a>.Â Another cornerstone paper.</li>
+<li><a href="http://bret.appspot.com/entry/how-friendfeed-uses-mysql";>How FriendFeed uses MySQL to store schema-less data</a></li>
+</ul>
+<p>The Big Guys:</p>
+<ul>
+<li><a href="http://project-voldemort.com/";>Voldemort</a></li>
+<li><a href="http://incubator.apache.org/cassandra/";>Cassandra</a></li>
+<li><a href="http://wiki.apache.org/hadoop/HBase";>HBase</a> &#8212; We didn&#8217;t get to this one, but it&#8217;s modelled on BigTable, and can replicate across geographically separated datacenters (Cassandra needs faster roundtrips).  And it&#8217;s what Hadoop uses internally.</li>
+</ul>
+<p>Midsized:</p>
+<ul>
+<li><a href="http://www.mongodb.org";>MongoDB</a> &#8212; Great for storing JSON objects.</li>
+<li><a href="http://couchdb.apache.org/";>CouchDB</a> &#8212; Erlang based, uses javascript as a query language.</li>
+</ul>
+<p>Niche:</p>
+<ul>
+<li><a href="http://code.google.com/p/redis";>Redis</a> &#8212; memcached with persistence and useful list/set/ordered-set datatypes.</li>
+<li><a href="http://code.google.com/p/redis/wiki/TwitterAlikeExample";>Redis twitter implementation</a> &#8212; simple example of building a twitter-like system on top of  redis.</li>
+</ul>
+<p>Underlying Technology</p>
+<ul>
+<li><a href="http://www.spiteful.com/2008/03/17/programmers-toolbox-part-3-consistent-hashing/";>Consistent Hashing</a>.</li>
+<li>Vector Clocks &#8212; See section 4.4 in the <a href="http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html ">Amazon Dynamo paper</a>.</li>
+<li>Important relationship between Consistency, Availability and Partition Tolerance, called the <a href="http://www.julianbrowne.com/article/viewer/brewers-cap-theorem";>CAP Theorem</a>.</li>
+</ul>
+<p>The image above is a picture of a Google datacenter in Oregon, where they no doubt run <a href="http://labs.google.com/papers/bigtable.html";>BigTable</a>.</p>
+]]></content:encoded>
+			<wfw:commentRss>http://hackermedley.org/archives/51/feed</wfw:commentRss>
+		<slash:comments>16</slash:comments>
+
+		<media:content url="http://feedproxy.google.com/~r/HackerMedley/~5/v7Xbw4Sq_xI/medley2.mp3"; fileSize="17546499" type="audio/mpeg" /><itunes:explicit>no</itunes:explicit><itunes:subtitle> In our second episode (12 minutes long), Alex and Nat talk about the new generation of &amp;#8220;NoSQL&amp;#8221; databases that have created a lot of interest among web developers; especially those lucky people dealing with thousands of simultaneous users and </itunes:subtitle><itunes:author>Nat Friedman and Alex Graveley</itunes:author><itunes:summary> In our second episode (12 minutes long), Alex and Nat talk about the new generation of &amp;#8220;NoSQL&amp;#8221; databases that have created a lot of interest among web developers; especially those lucky people dealing with thousands of simultaneous users and terabytes of data. Please feel free to leave a comment below after you&amp;#8217;ve listened to the episode. [...]</itunes:summary><itunes:keywords>hacker,software,programmer,code,
 linux,programming,open,source,geek,geeky</itunes:keywords><feedburner:origLink>http://hackermedley.org/archives/51</feedburner:origLink><enclosure url="http://feedproxy.google.com/~r/HackerMedley/~5/v7Xbw4Sq_xI/medley2.mp3"; length="17546499" type="audio/mpeg" /><feedburner:origEnclosureLink>http://hackermedley.org/podcasts/medley2.mp3</feedburner:origEnclosureLink></item>
+		<item>
+		<title>Pilot Show: The 26c3 and GSM security</title>
+		<link>http://feedproxy.google.com/~r/HackerMedley/~3/d1Wzoku_IXA/4</link>
+		<comments>http://hackermedley.org/archives/4#comments</comments>
+		<pubDate>Mon, 04 Jan 2010 22:22:19 +0000</pubDate>
+		<dc:creator>nat nat org (Nat Friedman and Alex Graveley)</dc:creator>
+				<category><![CDATA[Uncategorized]]></category>
+
+		<guid isPermaLink="false">http://hackermedley.org/?p=4</guid>
+		<description><![CDATA[
+Welcome to Hacker Medley!  We decided to try podcasting.
+In our pilot show, Nat Friedman shares what he learned about mobile phone security at the 26th annual Chaos Communications Congress in Berlin.
+It&#8217;s our first effort, so it&#8217;s a little rough. But please let us know what you think so we can decide whether or not [...]]]></description>
+			<content:encoded><![CDATA[<p style="text-align: center;"><img class=" aligncenter" title="Harald Welte presenting at the 26c3" src="http://hackermedley.org/images/haraldwelte.jpg"; alt="" width="500" height="375" /></p>
+<p>Welcome to Hacker Medley!  We decided to try podcasting.</p>
+<p>In our pilot show, Nat Friedman shares what he learned about mobile phone security at the 26th annual Chaos Communications Congress in Berlin.</p>
+
+<hr />It&#8217;s our first effort, so it&#8217;s a little rough. But please <a href="http://spreadsheets.google.com/viewform?formkey=dEZGWDlnOE94czNXbW5qRzVKUU5ILVE6MA";>let us know what you think</a> so we can decide whether or not to keep making these!</p>
+<p>If you want to learn more about the stuff Nat was describing, here are some handy links:</p>
+<ul>
+<li><a href="http://events.ccc.de/congress/2009/Fahrplan/attachments/1479_26C3.Karsten.Nohl.GSM.pdf";>GSM &#8212; SRSLY?</a> Karsten Nohl&#8217;s talk in which he revealed his project to generate rainbow tables for A5/1.</li>
+<li><a href="http://openbsc.gnumonks.org/trac/wiki/OpenBSC";>OpenBSC</a> Harald Welte&#8217;s project to create a GPL&#8217;d GSM Base Station Controller (and associated pieces) as a platform for security research.</li>
+<li>A <a href="http://cgi.ebay.com/ALCATEL-LUCENT-ANDREW-UMTS-BTS-1120-BASE-STATION_W0QQitemZ260395856814QQcmdZViewItemQQptZLH_DefaultDomain_0?hash=item3ca0cd73ae#ht_510wt_1164";>$250 Alcatel/Lucent BST on eBay</a>. Use this to provide OpenBSC with a radio link.</li>
+<li><a href="http://openbts.sourceforge.net/";>OpenBTS</a> Another effort to create an open source GSM network using a USRP (universal software radio peripheral).</li>
+<li><a href="http://wiki.openezx.org/BP";>A description of the baseband processor / application processor split</a> in mobile phones.</li>
+<li>InterestingÂ <a href="http://www.cryptophone.de/qa/intercept/index.html";>information about GSM security</a> from the folks at CryptoPhone.</li>
+<li><a href="https://svn.berlin.ccc.de/projects/airprobe/";>Airprobe</a> &#8212; a GSM sniffer project.</li>
+<li>Wikipedia&#8217;s explanation ofÂ <a href="http://en.wikipedia.org/wiki/Rainbow_table";>rainbow tables</a>.</li>
+</ul>
+<p>The image above is <a href="http://gnumonks.org/~laforge/weblog/";>Harald Welte</a> <a href="http://events.ccc.de/congress/2009/Fahrplan/events/3535.en.html";>presenting at the 26c3</a>.</p>
+]]></content:encoded>
+			<wfw:commentRss>http://hackermedley.org/archives/4/feed</wfw:commentRss>
+		<slash:comments>18</slash:comments>
+
+		<media:content url="http://feedproxy.google.com/~r/HackerMedley/~5/OM-rUzLmlJ4/medley0.mp3"; fileSize="21062970" type="audio/mpeg" /><itunes:explicit>no</itunes:explicit><itunes:subtitle> Welcome to Hacker Medley! We decided to try podcasting. In our pilot show, Nat Friedman shares what he learned about mobile phone security at the 26th annual Chaos Communications Congress in Berlin. It&amp;#8217;s our first effort, so it&amp;#8217;s a little rou</itunes:subtitle><itunes:author>Nat Friedman and Alex Graveley</itunes:author><itunes:summary> Welcome to Hacker Medley! We decided to try podcasting. In our pilot show, Nat Friedman shares what he learned about mobile phone security at the 26th annual Chaos Communications Congress in Berlin. It&amp;#8217;s our first effort, so it&amp;#8217;s a little rough. But please let us know what you think so we can decide whether or not [...]</itunes:summary><itunes:keywords>hacker,software,programmer,code,linux,programming,open,source,geek,
 geeky</itunes:keywords><feedburner:origLink>http://hackermedley.org/archives/4</feedburner:origLink><enclosure url="http://feedproxy.google.com/~r/HackerMedley/~5/OM-rUzLmlJ4/medley0.mp3"; length="21062970" type="audio/mpeg" /><feedburner:origEnclosureLink>http://hackermedley.org/podcasts/medley0.mp3</feedburner:origEnclosureLink></item>
+	<copyright>Copyright Hacker Medley. Licensed CC-BY-SA.</copyright><media:credit role="author">Nat Friedman and Alex Graveley</media:credit><media:rating>nonadult</media:rating><media:description type="plain">A short podcast for curious hackers.</media:description></channel>
+</rss>
[Date Prev][Date Next] [Thread Prev][Thread Next] [Thread Index] [Date Index] [Author Index]