birdgrinder/index.html @ 10f91124d5a9
clojure-lanterna: Update site.
| author | Steve Losh <steve@stevelosh.com> |
|---|---|
| date | Thu, 12 Jul 2012 22:39:19 -0400 |
| parents | 540e755bced8 |
| children | (none) |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>The Bird Grinder — The Bird Grinder v0.0.1 documentation</title> <link rel="stylesheet" href="_static/chunky.css" type="text/css" /> <link rel="stylesheet" href="_static/pygments.css" type="text/css" /> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT: '', VERSION: '0.0.1', COLLAPSE_INDEX: false, FILE_SUFFIX: '.html', HAS_SOURCE: true }; </script> <script type="text/javascript" src="_static/jquery.js"></script> <script type="text/javascript" src="_static/underscore.js"></script> <script type="text/javascript" src="_static/doctools.js"></script> <link rel="top" title="The Bird Grinder v0.0.1 documentation" href="#" /> </head> <body> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="genindex.html" title="General Index" accesskey="I">index</a></li> <li><a href="#">The Bird Grinder v0.0.1 documentation</a> »</li> </ul> </div> <div class="document"> <div class="documentwrapper"> <div class="body"> <div class="section" id="the-bird-grinder"> <h1>The Bird Grinder</h1> <div class="tagline container"> <div class="line-block"> <div class="line">A small <a class="reference external" href="http://python.org/">Python</a> library to cut through the spam & offtopic fat,</div> <div class="line">leaving only the <strong>meatiest</strong> tweets.</div> </div> </div> <div class="quickstart container"> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span> <span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">storage</span><span class="o">=</span><span class="s">'redis'</span><span class="p">)</span> <span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">good_tweets</span><span class="p">:</span> <span class="n">grinder</span><span class="o">.</span><span class="n">sharpen</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">,</span> <span class="s">'good'</span><span class="p">)</span> <span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">bad_tweets</span><span class="p">:</span> <span class="n">grinder</span><span class="o">.</span><span class="n">sharpen</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">,</span> <span class="s">'bad'</span><span class="p">)</span> <span class="n">grinder</span><span class="o">.</span><span class="n">save</span><span class="p">()</span> <span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">unknown_tweets</span><span class="p">:</span> <span class="k">print</span> <span class="n">grinder</span><span class="o">.</span><span class="n">grind</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">)</span> </pre></div> </div> <div class="get container"> <p><a class="reference external" href="http://bitbucket.org/sjl/birdgrinder/">Fork it on BitBucket ➙</a> <a class="reference external" href="http://github.com/sjl/birdgrinder/">Fork it on GitHub ➙</a></p> <p><a class="reference external" href="http://bitbucket.org/sjl/birdgrinder/issues/">Report a bug ➙</a></p> </div> </div> <div class="intro container"> <p>Trying to grab tweets about a <a class="reference external" href="http://hg-scm.org/">version control system</a> and getting tweets about <a class="reference external" href="http://en.wikipedia.org/wiki/Nike_Mercurial_Vapor">some kind of shoe</a> in the mix? Use the Bird Grinder to filter out what you want.</p> <p>Sharpen your grinder by training it, then grind new ones to find out if they pass the test.</p> <p>Save and restore your personalized grinder to a variety of backends like flat files or <a class="reference external" href="http://code.google.com/p/redis/">Redis</a>.</p> <p>It’s <a class="reference external" href="http://en.wikipedia.org/wiki/MIT_License">MIT/X11</a> licensed.</p> </div> <div class="contents local topic" id="contents"> <ul class="simple"> <li><a class="reference external" href="#installation" id="id3">Installation</a></li> <li><a class="reference external" href="#basic-usage" id="id4">Basic Usage</a></li> <li><a class="reference external" href="#saving-and-restoring" id="id5">Saving and Restoring</a><ul> <li><a class="reference external" href="#flat-files" id="id6">Flat Files</a></li> <li><a class="reference external" href="#id1" id="id7">Redis</a></li> </ul> </li> <li><a class="reference external" href="#advanced-usage" id="id8">Advanced Usage</a></li> <li><a class="reference external" href="#contributing" id="id9">Contributing</a></li> </ul> </div> <div class="section" id="installation"> <h2><a class="toc-backref" href="#id3">Installation</a></h2> <p>Install with <a class="reference external" href="http://pip.openplans.org/">pip</a> and <a class="reference external" href="http://hg-scm.org">Mercurial</a> or <a class="reference external" href="http://git-scm.com/">git</a>:</p> <div class="highlight-python"><pre>pip install -e hg+http://bitbucket.org/sjl/birdgrinder/#egg=birdgrinder pip install -e git+http://github.com/sjl/birdgrinder/#egg=birdgrinder</pre> </div> </div> <div class="section" id="basic-usage"> <h2><a class="toc-backref" href="#id4">Basic Usage</a></h2> <p>The Bird Grinder helps you filter out spam and offtopic tweets you receive from Twitter.</p> <p>Before you can filter the tweets you want you’ll need to create a <tt class="docutils literal"><span class="pre">Grinder</span></tt> and train or “sharpen” it with some tweets that you’ve marked as “good” and “bad”:</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span> <span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">()</span> <span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">bad_tweets</span><span class="p">:</span> <span class="n">grinder</span><span class="o">.</span><span class="n">sharpen</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">,</span> <span class="s">'bad'</span><span class="p">)</span> <span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">good_tweets</span><span class="p">:</span> <span class="n">grinder</span><span class="o">.</span><span class="n">sharpen</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">,</span> <span class="s">'good'</span><span class="p">)</span> </pre></div> </div> <p>The <tt class="docutils literal"><span class="pre">sharpen()</span></tt> method expects two arguments: a tweet (in the raw-JSON form you received from Twitter) and a category (either <tt class="docutils literal"><span class="pre">'good'</span></tt> or <tt class="docutils literal"><span class="pre">'bad'</span></tt>).</p> <p>Feel free to determine the categories of these “training tweets” however you like. You may want to manually categorize a number of tweets, or implement some sort of user-powered rating system. It’s up to you.</p> <p>Once you’ve sharpened your grinder you can use it to filter new tweets:</p> <div class="highlight-python"><div class="highlight"><pre><span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">new_tweets</span><span class="p">:</span> <span class="n">result</span> <span class="o">=</span> <span class="n">grinder</span><span class="o">.</span><span class="n">grind</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">)</span> <span class="k">if</span> <span class="n">result</span> <span class="o">==</span> <span class="s">'good'</span><span class="p">:</span> <span class="k">print</span> <span class="s">'This tweet is satisfactory.'</span> <span class="k">elif</span> <span class="n">result</span> <span class="o">==</span> <span class="s">'bad'</span><span class="p">:</span> <span class="k">print</span> <span class="s">'This tweet is unacceptable.'</span> <span class="k">elif</span> <span class="n">result</span> <span class="o">==</span> <span class="s">'unknown'</span><span class="p">:</span> <span class="k">print</span> <span class="s">'Not enough information to classify this tweet.'</span> </pre></div> </div> <p>The <tt class="docutils literal"><span class="pre">grind()</span></tt> method returns one of three results: <tt class="docutils literal"><span class="pre">'good'</span></tt>, <tt class="docutils literal"><span class="pre">'bad'</span></tt>, or <tt class="docutils literal"><span class="pre">'unknown'</span></tt>.</p> </div> <div class="section" id="saving-and-restoring"> <h2><a class="toc-backref" href="#id5">Saving and Restoring</a></h2> <p>Once you get your grinder nice and sharp you’ll probably want to save it so you can use it again later.</p> <p>If some data is already saved then creating a new grinder will initialize it with that data. To save the data call <tt class="docutils literal"><span class="pre">grinder.save()</span></tt>, and to restore from the last saved state call <tt class="docutils literal"><span class="pre">grinder.restore()</span></tt>:</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span> <span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">()</span> <span class="c"># ... train ...</span> <span class="n">grinder</span><span class="o">.</span><span class="n">save</span><span class="p">()</span> <span class="c"># Create a new grinder and restore the saved data automatically.</span> <span class="n">new_grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">()</span> <span class="c"># ... train with bad data ...</span> <span class="c"># Throw away the bad data and restore the last saved state.</span> <span class="n">new_grinder</span><span class="o">.</span><span class="n">restore</span><span class="p">()</span> </pre></div> </div> <p>The Bird Grinder can save and restore to and from a number of formats. The default is JSON data stored in flat files.</p> <div class="section" id="flat-files"> <h3><a class="toc-backref" href="#id6">Flat Files</a></h3> <p>JSON data in flat files is used by default because it’s supported everywhere. You can provide a filename when creating your grinder if you like – the default is <tt class="docutils literal"><span class="pre">./birdgrinder.json</span></tt>:</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span> <span class="c"># Saves/restores to/from ./birdgrinder.json</span> <span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">()</span> <span class="c"># Saves/restores to/from /tmp/birdgrinder.json</span> <span class="n">other_grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s">'/tmp/birdgrinder.json'</span><span class="p">)</span> </pre></div> </div> </div> <div class="section" id="id1"> <h3><a class="toc-backref" href="#id7">Redis</a></h3> <p>If you’re going to be saving and/or restoring frequently you may want to store the data in <a class="reference external" href="http://code.google.com/p/redis/">Redis</a> for better performance.</p> <p>To use Redis you can pass <tt class="docutils literal"><span class="pre">storage='redis'</span></tt> when you create your Grinder, and then use <tt class="docutils literal"><span class="pre">grinder.save()</span></tt> to save your data as usual. You can pass an optional <tt class="docutils literal"><span class="pre">key_prefix</span></tt> argument to specify a custom prefix for the keys – the default is <tt class="docutils literal"><span class="pre">birdgrinder</span></tt>:</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span> <span class="c"># Saves/restores to/from birdgrinder:*</span> <span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">storage</span><span class="o">=</span><span class="s">'redis'</span><span class="p">)</span> <span class="c"># Saves/restores to/from anothergrinder:*</span> <span class="n">other_grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">storage</span><span class="o">=</span><span class="s">'redis'</span><span class="p">,</span> <span class="n">key_prefix</span><span class="o">=</span><span class="s">'anothergrinder'</span><span class="p">)</span> </pre></div> </div> <p>If your Redis instance isn’t running on localhost on the default port with the default database number you can change that as well:</p> <div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span> <span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">storage</span><span class="o">=</span><span class="s">'redis'</span><span class="p">,</span> <span class="n">host</span><span class="o">=</span><span class="s">'192.168.0.16'</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="mi">7000</span><span class="p">,</span> <span class="n">db</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span> </pre></div> </div> </div> </div> <div class="section" id="advanced-usage"> <h2><a class="toc-backref" href="#id8">Advanced Usage</a></h2> <p>TODO: Later.</p> </div> <div class="section" id="contributing"> <h2><a class="toc-backref" href="#id9">Contributing</a></h2> <p>To contribute bug fixes, performance improvements or new features just fork the <a class="reference external" href="http://bitbucket.org/sjl/birdgrinder/">BitBucket repository</a> or <a class="reference external" href="http://github.com/sjl/birdgrinder/">GitHub repository</a> and send a pull request.</p> </div> </div> </div> </div> <div class="clearer"></div> </div> <div class="related"> <h3>Navigation</h3> <ul> <li class="right" style="margin-right: 10px"> <a href="genindex.html" title="General Index" >index</a></li> <li><a href="#">The Bird Grinder v0.0.1 documentation</a> »</li> </ul> </div> <div class="footer"> © Copyright 2010, Steve Losh. Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.0. </div> </body> </html>