birdgrinder/index.html @ ff4f055bb7f9
roul: Update site.
author |
Steve Losh <steve@stevelosh.com> |
date |
Sat, 07 Apr 2012 17:30:54 -0400 |
parents |
540e755bced8 |
children |
(none) |
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>The Bird Grinder — The Bird Grinder v0.0.1 documentation</title>
<link rel="stylesheet" href="_static/chunky.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '',
VERSION: '0.0.1',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="top" title="The Bird Grinder v0.0.1 documentation" href="#" />
</head>
<body>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li><a href="#">The Bird Grinder v0.0.1 documentation</a> »</li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="body">
<div class="section" id="the-bird-grinder">
<h1>The Bird Grinder</h1>
<div class="tagline container">
<div class="line-block">
<div class="line">A small <a class="reference external" href="http://python.org/">Python</a> library to cut through the spam & offtopic fat,</div>
<div class="line">leaving only the <strong>meatiest</strong> tweets.</div>
</div>
</div>
<div class="quickstart container">
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span>
<span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">storage</span><span class="o">=</span><span class="s">'redis'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">good_tweets</span><span class="p">:</span>
<span class="n">grinder</span><span class="o">.</span><span class="n">sharpen</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">,</span> <span class="s">'good'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">bad_tweets</span><span class="p">:</span>
<span class="n">grinder</span><span class="o">.</span><span class="n">sharpen</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">,</span> <span class="s">'bad'</span><span class="p">)</span>
<span class="n">grinder</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">unknown_tweets</span><span class="p">:</span>
<span class="k">print</span> <span class="n">grinder</span><span class="o">.</span><span class="n">grind</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">)</span>
</pre></div>
</div>
<div class="get container">
<p><a class="reference external" href="http://bitbucket.org/sjl/birdgrinder/">Fork it on BitBucket ➙</a>
<a class="reference external" href="http://github.com/sjl/birdgrinder/">Fork it on GitHub ➙</a></p>
<p><a class="reference external" href="http://bitbucket.org/sjl/birdgrinder/issues/">Report a bug ➙</a></p>
</div>
</div>
<div class="intro container">
<p>Trying to grab tweets about a <a class="reference external" href="http://hg-scm.org/">version control system</a> and getting tweets about <a class="reference external" href="http://en.wikipedia.org/wiki/Nike_Mercurial_Vapor">some kind of shoe</a> in the mix? Use the
Bird Grinder to filter out what you want.</p>
<p>Sharpen your grinder by training it, then grind new ones to find out if
they pass the test.</p>
<p>Save and restore your personalized grinder to a variety of backends like
flat files or <a class="reference external" href="http://code.google.com/p/redis/">Redis</a>.</p>
<p>It’s <a class="reference external" href="http://en.wikipedia.org/wiki/MIT_License">MIT/X11</a> licensed.</p>
</div>
<div class="contents local topic" id="contents">
<ul class="simple">
<li><a class="reference external" href="#installation" id="id3">Installation</a></li>
<li><a class="reference external" href="#basic-usage" id="id4">Basic Usage</a></li>
<li><a class="reference external" href="#saving-and-restoring" id="id5">Saving and Restoring</a><ul>
<li><a class="reference external" href="#flat-files" id="id6">Flat Files</a></li>
<li><a class="reference external" href="#id1" id="id7">Redis</a></li>
</ul>
</li>
<li><a class="reference external" href="#advanced-usage" id="id8">Advanced Usage</a></li>
<li><a class="reference external" href="#contributing" id="id9">Contributing</a></li>
</ul>
</div>
<div class="section" id="installation">
<h2><a class="toc-backref" href="#id3">Installation</a></h2>
<p>Install with <a class="reference external" href="http://pip.openplans.org/">pip</a> and <a class="reference external" href="http://hg-scm.org">Mercurial</a> or <a class="reference external" href="http://git-scm.com/">git</a>:</p>
<div class="highlight-python"><pre>pip install -e hg+http://bitbucket.org/sjl/birdgrinder/#egg=birdgrinder
pip install -e git+http://github.com/sjl/birdgrinder/#egg=birdgrinder</pre>
</div>
</div>
<div class="section" id="basic-usage">
<h2><a class="toc-backref" href="#id4">Basic Usage</a></h2>
<p>The Bird Grinder helps you filter out spam and offtopic tweets you receive from
Twitter.</p>
<p>Before you can filter the tweets you want you’ll need to create a <tt class="docutils literal"><span class="pre">Grinder</span></tt>
and train or “sharpen” it with some tweets that you’ve marked as “good” and
“bad”:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span>
<span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">()</span>
<span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">bad_tweets</span><span class="p">:</span>
<span class="n">grinder</span><span class="o">.</span><span class="n">sharpen</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">,</span> <span class="s">'bad'</span><span class="p">)</span>
<span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">good_tweets</span><span class="p">:</span>
<span class="n">grinder</span><span class="o">.</span><span class="n">sharpen</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">,</span> <span class="s">'good'</span><span class="p">)</span>
</pre></div>
</div>
<p>The <tt class="docutils literal"><span class="pre">sharpen()</span></tt> method expects two arguments: a tweet (in the raw-JSON form
you received from Twitter) and a category (either <tt class="docutils literal"><span class="pre">'good'</span></tt> or <tt class="docutils literal"><span class="pre">'bad'</span></tt>).</p>
<p>Feel free to determine the categories of these “training tweets” however you
like. You may want to manually categorize a number of tweets, or implement
some sort of user-powered rating system. It’s up to you.</p>
<p>Once you’ve sharpened your grinder you can use it to filter new tweets:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="k">for</span> <span class="n">tweet_json</span> <span class="ow">in</span> <span class="n">new_tweets</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">grinder</span><span class="o">.</span><span class="n">grind</span><span class="p">(</span><span class="n">tweet_json</span><span class="p">)</span>
<span class="k">if</span> <span class="n">result</span> <span class="o">==</span> <span class="s">'good'</span><span class="p">:</span>
<span class="k">print</span> <span class="s">'This tweet is satisfactory.'</span>
<span class="k">elif</span> <span class="n">result</span> <span class="o">==</span> <span class="s">'bad'</span><span class="p">:</span>
<span class="k">print</span> <span class="s">'This tweet is unacceptable.'</span>
<span class="k">elif</span> <span class="n">result</span> <span class="o">==</span> <span class="s">'unknown'</span><span class="p">:</span>
<span class="k">print</span> <span class="s">'Not enough information to classify this tweet.'</span>
</pre></div>
</div>
<p>The <tt class="docutils literal"><span class="pre">grind()</span></tt> method returns one of three results: <tt class="docutils literal"><span class="pre">'good'</span></tt>, <tt class="docutils literal"><span class="pre">'bad'</span></tt>, or
<tt class="docutils literal"><span class="pre">'unknown'</span></tt>.</p>
</div>
<div class="section" id="saving-and-restoring">
<h2><a class="toc-backref" href="#id5">Saving and Restoring</a></h2>
<p>Once you get your grinder nice and sharp you’ll probably want to save it so you
can use it again later.</p>
<p>If some data is already saved then creating a new grinder will initialize it
with that data. To save the data call <tt class="docutils literal"><span class="pre">grinder.save()</span></tt>, and to restore from
the last saved state call <tt class="docutils literal"><span class="pre">grinder.restore()</span></tt>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span>
<span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">()</span>
<span class="c"># ... train ...</span>
<span class="n">grinder</span><span class="o">.</span><span class="n">save</span><span class="p">()</span>
<span class="c"># Create a new grinder and restore the saved data automatically.</span>
<span class="n">new_grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">()</span>
<span class="c"># ... train with bad data ...</span>
<span class="c"># Throw away the bad data and restore the last saved state.</span>
<span class="n">new_grinder</span><span class="o">.</span><span class="n">restore</span><span class="p">()</span>
</pre></div>
</div>
<p>The Bird Grinder can save and restore to and from a number of formats. The
default is JSON data stored in flat files.</p>
<div class="section" id="flat-files">
<h3><a class="toc-backref" href="#id6">Flat Files</a></h3>
<p>JSON data in flat files is used by default because it’s supported everywhere.
You can provide a filename when creating your grinder if you like – the
default is <tt class="docutils literal"><span class="pre">./birdgrinder.json</span></tt>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span>
<span class="c"># Saves/restores to/from ./birdgrinder.json</span>
<span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">()</span>
<span class="c"># Saves/restores to/from /tmp/birdgrinder.json</span>
<span class="n">other_grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">filename</span><span class="o">=</span><span class="s">'/tmp/birdgrinder.json'</span><span class="p">)</span>
</pre></div>
</div>
</div>
<div class="section" id="id1">
<h3><a class="toc-backref" href="#id7">Redis</a></h3>
<p>If you’re going to be saving and/or restoring frequently you may want to store
the data in <a class="reference external" href="http://code.google.com/p/redis/">Redis</a> for better performance.</p>
<p>To use Redis you can pass <tt class="docutils literal"><span class="pre">storage='redis'</span></tt> when you create your Grinder, and
then use <tt class="docutils literal"><span class="pre">grinder.save()</span></tt> to save your data as usual. You can pass an
optional <tt class="docutils literal"><span class="pre">key_prefix</span></tt> argument to specify a custom prefix for the keys – the
default is <tt class="docutils literal"><span class="pre">birdgrinder</span></tt>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span>
<span class="c"># Saves/restores to/from birdgrinder:*</span>
<span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">storage</span><span class="o">=</span><span class="s">'redis'</span><span class="p">)</span>
<span class="c"># Saves/restores to/from anothergrinder:*</span>
<span class="n">other_grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">storage</span><span class="o">=</span><span class="s">'redis'</span><span class="p">,</span> <span class="n">key_prefix</span><span class="o">=</span><span class="s">'anothergrinder'</span><span class="p">)</span>
</pre></div>
</div>
<p>If your Redis instance isn’t running on localhost on the default port with the
default database number you can change that as well:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">birdgrinder</span> <span class="kn">import</span> <span class="n">Grinder</span>
<span class="n">grinder</span> <span class="o">=</span> <span class="n">Grinder</span><span class="p">(</span><span class="n">storage</span><span class="o">=</span><span class="s">'redis'</span><span class="p">,</span> <span class="n">host</span><span class="o">=</span><span class="s">'192.168.0.16'</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="mi">7000</span><span class="p">,</span> <span class="n">db</span><span class="o">=</span><span class="mi">2</span><span class="p">)</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="advanced-usage">
<h2><a class="toc-backref" href="#id8">Advanced Usage</a></h2>
<p>TODO: Later.</p>
</div>
<div class="section" id="contributing">
<h2><a class="toc-backref" href="#id9">Contributing</a></h2>
<p>To contribute bug fixes, performance improvements or new features just fork the
<a class="reference external" href="http://bitbucket.org/sjl/birdgrinder/">BitBucket repository</a> or <a class="reference external" href="http://github.com/sjl/birdgrinder/">GitHub
repository</a> and send a pull request.</p>
</div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li><a href="#">The Bird Grinder v0.0.1 documentation</a> »</li>
</ul>
</div>
<div class="footer">
© Copyright 2010, Steve Losh.
Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.0.
</div>
</body>
</html>