birdgrinder/index.html @ d9cb54f8b173

clojure-postmark: Update site.
author Steve Losh <steve@stevelosh.com>
date Tue, 10 Apr 2012 10:15:03 -0400
parents 540e755bced8
children (none)

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    
    <title>The Bird Grinder &mdash; The Bird Grinder v0.0.1 documentation</title>
    <link rel="stylesheet" href="_static/chunky.css" type="text/css" />
    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
    <script type="text/javascript">
      var DOCUMENTATION_OPTIONS = {
        URL_ROOT:    '',
        VERSION:     '0.0.1',
        COLLAPSE_INDEX: false,
        FILE_SUFFIX: '.html',
        HAS_SOURCE:  true
      };
    </script>
    <script type="text/javascript" src="_static/jquery.js"></script>
    <script type="text/javascript" src="_static/underscore.js"></script>
    <script type="text/javascript" src="_static/doctools.js"></script>
    <link rel="top" title="The Bird Grinder v0.0.1 documentation" href="#" /> 
  </head>
  <body>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             accesskey="I">index</a></li>
        <li><a href="#">The Bird Grinder v0.0.1 documentation</a> &raquo;</li> 
      </ul>
    </div>  

    <div class="document">
      <div class="documentwrapper">
          <div class="body">
            
  <div class="section" id="the-bird-grinder">
<h1>The Bird Grinder</h1>
<div class="tagline container">
<div class="line-block">
<div class="line">A small <a class="reference external" href="http://python.org/">Python</a>  library to cut through the spam &amp; offtopic fat,</div>
<div class="line">leaving only the <strong>meatiest</strong> tweets.</div>
</div>
</div>
<div class="quickstart container">
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span>

<span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">storage</span><span class="o">=</span><span class="s">&#39;redis&#39;</span><span class="p">)</span>

<span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">good_tweets</span><span class="p">:</span>
    <span class="n">grinder</span><span class="o">.</span><span class="n">sharpen</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">,</span> <span class="s">&#39;good&#39;</span><span class="p">)</span>

<span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">bad_tweets</span><span class="p">:</span>
    <span class="n">grinder</span><span class="o">.</span><span class="n">sharpen</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">,</span> <span class="s">&#39;bad&#39;</span><span class="p">)</span>

<span class="n">grinder</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>

<span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">unknown_tweets</span><span class="p">:</span>
    <span class="k">print</span> <span class="n">grinder</span><span class="o">.</span><span class="n">grind</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">)</span>
</pre></div>
</div>
<div class="get container">
<p><a class="reference external" href="http://bitbucket.org/sjl/birdgrinder/">Fork it on BitBucket ➙</a>
<a class="reference external" href="http://github.com/sjl/birdgrinder/">Fork it on GitHub ➙</a></p>
<p><a class="reference external" href="http://bitbucket.org/sjl/birdgrinder/issues/">Report a bug ➙</a></p>
</div>
</div>
<div class="intro container">
<p>Trying to grab tweets about a <a class="reference external" href="http://hg-scm.org/">version control system</a> and getting tweets about <a class="reference external" href="http://en.wikipedia.org/wiki/Nike_Mercurial_Vapor">some kind of shoe</a> in the mix? Use the
Bird Grinder to filter out what you want.</p>
<p>Sharpen your grinder by training it, then grind new ones to find out if
they pass the test.</p>
<p>Save and restore your personalized grinder to a variety of backends like
flat files or <a class="reference external" href="http://code.google.com/p/redis/">Redis</a>.</p>
<p>It&#8217;s <a class="reference external" href="http://en.wikipedia.org/wiki/MIT_License">MIT/X11</a> licensed.</p>
</div>
<div class="contents local topic" id="contents">
<ul class="simple">
<li><a class="reference external" href="#installation" id="id3">Installation</a></li>
<li><a class="reference external" href="#basic-usage" id="id4">Basic Usage</a></li>
<li><a class="reference external" href="#saving-and-restoring" id="id5">Saving and Restoring</a><ul>
<li><a class="reference external" href="#flat-files" id="id6">Flat Files</a></li>
<li><a class="reference external" href="#id1" id="id7">Redis</a></li>
</ul>
</li>
<li><a class="reference external" href="#advanced-usage" id="id8">Advanced Usage</a></li>
<li><a class="reference external" href="#contributing" id="id9">Contributing</a></li>
</ul>
</div>
<div class="section" id="installation">
<h2><a class="toc-backref" href="#id3">Installation</a></h2>
<p>Install with <a class="reference external" href="http://pip.openplans.org/">pip</a> and <a class="reference external" href="http://hg-scm.org">Mercurial</a> or <a class="reference external" href="http://git-scm.com/">git</a>:</p>
<div class="highlight-python"><pre>pip install -e hg+http://bitbucket.org/sjl/birdgrinder/#egg=birdgrinder
pip install -e git+http://github.com/sjl/birdgrinder/#egg=birdgrinder</pre>
</div>
</div>
<div class="section" id="basic-usage">
<h2><a class="toc-backref" href="#id4">Basic Usage</a></h2>
<p>The Bird Grinder helps you filter out spam and offtopic tweets you receive from
Twitter.</p>
<p>Before you can filter the tweets you want you&#8217;ll need to create a <tt class="docutils literal"><span class="pre">Grinder</span></tt>
and train or &#8220;sharpen&#8221; it with some tweets that you&#8217;ve marked as &#8220;good&#8221; and
&#8220;bad&#8221;:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span>

<span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">()</span>

<span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">bad_tweets</span><span class="p">:</span>
    <span class="n">grinder</span><span class="o">.</span><span class="n">sharpen</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">,</span> <span class="s">&#39;bad&#39;</span><span class="p">)</span>

<span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">good_tweets</span><span class="p">:</span>
    <span class="n">grinder</span><span class="o">.</span><span class="n">sharpen</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">,</span> <span class="s">&#39;good&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>The <tt class="docutils literal"><span class="pre">sharpen()</span></tt> method expects two arguments: a tweet (in the raw-JSON form
you received from Twitter) and a category (either <tt class="docutils literal"><span class="pre">'good'</span></tt> or <tt class="docutils literal"><span class="pre">'bad'</span></tt>).</p>
<p>Feel free to determine the categories of these &#8220;training tweets&#8221; however you
like. You may want to manually categorize a number of tweets, or implement
some sort of user-powered rating system. It&#8217;s up to you.</p>
<p>Once you&#8217;ve sharpened your grinder you can use it to filter new tweets:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">new_tweets</span><span class="p">:</span>
    <span class="n">result</span> <span class="o">=</span> <span class="n">grinder</span><span class="o">.</span><span class="n">grind</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">)</span>

    <span class="k">if</span> <span class="n">result</span> <span class="o">==</span> <span class="s">&#39;good&#39;</span><span class="p">:</span>
        <span class="k">print</span> <span class="s">&#39;This tweet is satisfactory.&#39;</span>
    <span class="k">elif</span> <span class="n">result</span> <span class="o">==</span> <span class="s">&#39;bad&#39;</span><span class="p">:</span>
        <span class="k">print</span> <span class="s">&#39;This tweet is unacceptable.&#39;</span>
    <span class="k">elif</span> <span class="n">result</span> <span class="o">==</span> <span class="s">&#39;unknown&#39;</span><span class="p">:</span>
        <span class="k">print</span> <span class="s">&#39;Not enough information to classify this tweet.&#39;</span>
</pre></div>
</div>
<p>The <tt class="docutils literal"><span class="pre">grind()</span></tt> method returns one of three results: <tt class="docutils literal"><span class="pre">'good'</span></tt>, <tt class="docutils literal"><span class="pre">'bad'</span></tt>, or
<tt class="docutils literal"><span class="pre">'unknown'</span></tt>.</p>
</div>
<div class="section" id="saving-and-restoring">
<h2><a class="toc-backref" href="#id5">Saving and Restoring</a></h2>
<p>Once you get your grinder nice and sharp you&#8217;ll probably want to save it so you
can use it again later.</p>
<p>If some data is already saved then creating a new grinder will initialize it
with that data.  To save the data call <tt class="docutils literal"><span class="pre">grinder.save()</span></tt>, and to restore from
the last saved state call <tt class="docutils literal"><span class="pre">grinder.restore()</span></tt>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span>

<span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">()</span>

<span class="c"># ... train ...</span>

<span class="n">grinder</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>

<span class="c"># Create a new grinder and restore the saved data automatically.</span>
<span class="n">new_grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">()</span>

<span class="c"># ... train with bad data ...</span>

<span class="c"># Throw away the bad data and restore the last saved state.</span>
<span class="n">new_grinder</span><span class="o">.</span><span class="n">restore</span><span class="p">()</span>
</pre></div>
</div>
<p>The Bird Grinder can save and restore to and from a number of formats. The
default is JSON data stored in flat files.</p>
<div class="section" id="flat-files">
<h3><a class="toc-backref" href="#id6">Flat Files</a></h3>
<p>JSON data in flat files is used by default because it&#8217;s supported everywhere.
You can provide a filename when creating your grinder if you like &#8211; the
default is <tt class="docutils literal"><span class="pre">./birdgrinder.json</span></tt>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span>

<span class="c"># Saves/restores to/from ./birdgrinder.json</span>
<span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">()</span>

<span class="c"># Saves/restores to/from /tmp/birdgrinder.json</span>
<span class="n">other_grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s">&#39;/tmp/birdgrinder.json&#39;</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="id1">
<h3><a class="toc-backref" href="#id7">Redis</a></h3>
<p>If you&#8217;re going to be saving and/or restoring frequently you may want to store
the data in <a class="reference external" href="http://code.google.com/p/redis/">Redis</a> for better performance.</p>
<p>To use Redis you can pass <tt class="docutils literal"><span class="pre">storage='redis'</span></tt> when you create your Grinder, and
then use <tt class="docutils literal"><span class="pre">grinder.save()</span></tt> to save your data as usual. You can pass an
optional <tt class="docutils literal"><span class="pre">key_prefix</span></tt> argument to specify a custom prefix for the keys &#8211; the
default is <tt class="docutils literal"><span class="pre">birdgrinder</span></tt>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span>

<span class="c"># Saves/restores to/from birdgrinder:*</span>
<span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">storage</span><span class="o">=</span><span class="s">&#39;redis&#39;</span><span class="p">)</span>

<span class="c"># Saves/restores to/from anothergrinder:*</span>
<span class="n">other_grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">storage</span><span class="o">=</span><span class="s">&#39;redis&#39;</span><span class="p">,</span> <span class="n">key_prefix</span><span class="o">=</span><span class="s">&#39;anothergrinder&#39;</span><span class="p">)</span>
</pre></div>
</div>
<p>If your Redis instance isn&#8217;t running on localhost on the default port with the
default database number you can change that as well:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span>

<span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">storage</span><span class="o">=</span><span class="s">&#39;redis&#39;</span><span class="p">,</span> <span class="n">host</span><span class="o">=</span><span class="s">&#39;192.168.0.16&#39;</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="mi">7000</span><span class="p">,</span> <span class="n">db</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="advanced-usage">
<h2><a class="toc-backref" href="#id8">Advanced Usage</a></h2>
<p>TODO: Later.</p>
</div>
<div class="section" id="contributing">
<h2><a class="toc-backref" href="#id9">Contributing</a></h2>
<p>To contribute bug fixes, performance improvements or new features just fork the
<a class="reference external" href="http://bitbucket.org/sjl/birdgrinder/">BitBucket repository</a> or <a class="reference external" href="http://github.com/sjl/birdgrinder/">GitHub
repository</a> and send a pull request.</p>
</div>
</div>


          </div>
      </div>
      <div class="clearer"></div>
    </div>
    <div class="related">
      <h3>Navigation</h3>
      <ul>
        <li class="right" style="margin-right: 10px">
          <a href="genindex.html" title="General Index"
             >index</a></li>
        <li><a href="#">The Bird Grinder v0.0.1 documentation</a> &raquo;</li> 
      </ul>
    </div>
    <div class="footer">
        &copy; Copyright 2010, Steve Losh.
      Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.0.
    </div>
  </body>
</html>