birdgrinder/_sources/index.txt @ 2c3fa59a762a
threesome.vim: Update site.
| author | Steve Losh <steve@stevelosh.com> |
|---|---|
| date | Tue, 10 Apr 2012 10:19:57 -0400 |
| parents | 08b2deff2e09 |
| children | (none) |
.. The Bird Grinder documentation master file, created by sphinx-quickstart on Sat Jul 24 16:36:41 2010. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. The Bird Grinder ================ .. container:: tagline | A small `Python <http://python.org/>`_ library to cut through the spam & offtopic fat, | leaving only the **meatiest** tweets. .. container:: quickstart :: from birdgrinder import Grinder grinder = Grinder(storage='redis') for tweet_json in good_tweets: grinder.sharpen(tweet_json, 'good') for tweet_json in bad_tweets: grinder.sharpen(tweet_json, 'bad') grinder.save() for tweet_json in unknown_tweets: print grinder.grind(tweet_json) .. container:: get `Fork it on BitBucket ➙ <http://bitbucket.org/sjl/birdgrinder/>`_ `Fork it on GitHub ➙ <http://github.com/sjl/birdgrinder/>`_ `Report a bug ➙ <http://bitbucket.org/sjl/birdgrinder/issues/>`_ .. container:: intro Trying to grab tweets about a `version control system <http://hg-scm.org/>`_ and getting tweets about `some kind of shoe <http://en.wikipedia.org/wiki/Nike_Mercurial_Vapor>`_ in the mix? Use the Bird Grinder to filter out what you want. Sharpen your grinder by training it, then grind new ones to find out if they pass the test. Save and restore your personalized grinder to a variety of backends like flat files or `Redis <http://code.google.com/p/redis/>`_. It's `MIT/X11 <http://en.wikipedia.org/wiki/MIT_License>`_ licensed. .. contents:: :local: Installation ------------ Install with `pip <http://pip.openplans.org/>`_ and `Mercurial <http://hg-scm.org>`_ or `git <http://git-scm.com/>`_:: pip install -e hg+http://bitbucket.org/sjl/birdgrinder/#egg=birdgrinder pip install -e git+http://github.com/sjl/birdgrinder/#egg=birdgrinder Basic Usage ----------- The Bird Grinder helps you filter out spam and offtopic tweets you receive from Twitter. Before you can filter the tweets you want you'll need to create a ``Grinder`` and train or "sharpen" it with some tweets that you've marked as "good" and "bad":: from birdgrinder import Grinder grinder = Grinder() for tweet_json in bad_tweets: grinder.sharpen(tweet_json, 'bad') for tweet_json in good_tweets: grinder.sharpen(tweet_json, 'good') The ``sharpen()`` method expects two arguments: a tweet (in the raw-JSON form you received from Twitter) and a category (either ``'good'`` or ``'bad'``). Feel free to determine the categories of these "training tweets" however you like. You may want to manually categorize a number of tweets, or implement some sort of user-powered rating system. It's up to you. Once you've sharpened your grinder you can use it to filter new tweets:: for tweet_json in new_tweets: result = grinder.grind(tweet_json) if result == 'good': print 'This tweet is satisfactory.' elif result == 'bad': print 'This tweet is unacceptable.' elif result == 'unknown': print 'Not enough information to classify this tweet.' The ``grind()`` method returns one of three results: ``'good'``, ``'bad'``, or ``'unknown'``. Saving and Restoring -------------------- Once you get your grinder nice and sharp you'll probably want to save it so you can use it again later. If some data is already saved then creating a new grinder will initialize it with that data. To save the data call ``grinder.save()``, and to restore from the last saved state call ``grinder.restore()``:: from birdgrinder import Grinder grinder = Grinder() # ... train ... grinder.save() # Create a new grinder and restore the saved data automatically. new_grinder = Grinder() # ... train with bad data ... # Throw away the bad data and restore the last saved state. new_grinder.restore() The Bird Grinder can save and restore to and from a number of formats. The default is JSON data stored in flat files. Flat Files '''''''''' JSON data in flat files is used by default because it's supported everywhere. You can provide a filename when creating your grinder if you like -- the default is ``./birdgrinder.json``:: from birdgrinder import Grinder # Saves/restores to/from ./birdgrinder.json grinder = Grinder() # Saves/restores to/from /tmp/birdgrinder.json other_grinder = Grinder(filename='/tmp/birdgrinder.json') Redis ''''' If you're going to be saving and/or restoring frequently you may want to store the data in `Redis <http://code.google.com/p/redis/>`_ for better performance. To use Redis you can pass ``storage='redis'`` when you create your Grinder, and then use ``grinder.save()`` to save your data as usual. You can pass an optional ``key_prefix`` argument to specify a custom prefix for the keys -- the default is ``birdgrinder``:: from birdgrinder import Grinder # Saves/restores to/from birdgrinder:* grinder = Grinder(storage='redis') # Saves/restores to/from anothergrinder:* other_grinder = Grinder(storage='redis', key_prefix='anothergrinder') If your Redis instance isn't running on localhost on the default port with the default database number you can change that as well:: from birdgrinder import Grinder grinder = Grinder(storage='redis', host='192.168.0.16', port=7000, db=2) Advanced Usage -------------- TODO: Later. Contributing ------------ To contribute bug fixes, performance improvements or new features just fork the `BitBucket repository <http://bitbucket.org/sjl/birdgrinder/>`_ or `GitHub repository <http://github.com/sjl/birdgrinder/>`_ and send a pull request.