hg.stevelosh.com > docs.stevelosh.com

.. The Bird Grinder documentation master file, created by
   sphinx-quickstart on Sat Jul 24 16:36:41 2010.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

The Bird Grinder
================

.. container:: tagline

   | A small `Python <http://python.org/>`_  library to cut through the spam & offtopic fat,
   | leaving only the **meatiest** tweets.

.. container:: quickstart

    ::

        from birdgrinder import Grinder

        grinder = Grinder(storage='redis')

        for tweet_json in good_tweets:
            grinder.sharpen(tweet_json, 'good')

        for tweet_json in bad_tweets:
            grinder.sharpen(tweet_json, 'bad')

        grinder.save()

        for tweet_json in unknown_tweets:
            print grinder.grind(tweet_json)

    .. container:: get

       `Fork it on BitBucket ➙ <http://bitbucket.org/sjl/birdgrinder/>`_
       `Fork it on GitHub ➙ <http://github.com/sjl/birdgrinder/>`_

       `Report a bug ➙ <http://bitbucket.org/sjl/birdgrinder/issues/>`_


.. container:: intro

    Trying to grab tweets about a `version control system
    <http://hg-scm.org/>`_ and getting tweets about `some kind of shoe
    <http://en.wikipedia.org/wiki/Nike_Mercurial_Vapor>`_ in the mix? Use the
    Bird Grinder to filter out what you want.

    Sharpen your grinder by training it, then grind new ones to find out if
    they pass the test.

    Save and restore your personalized grinder to a variety of backends like
    flat files or `Redis <http://code.google.com/p/redis/>`_.

    It's `MIT/X11 <http://en.wikipedia.org/wiki/MIT_License>`_ licensed.

.. contents::
   :local:


Installation
------------

Install with `pip <http://pip.openplans.org/>`_ and `Mercurial
<http://hg-scm.org>`_ or `git <http://git-scm.com/>`_::

    pip install -e hg+http://bitbucket.org/sjl/birdgrinder/#egg=birdgrinder
    pip install -e git+http://github.com/sjl/birdgrinder/#egg=birdgrinder

Basic Usage
-----------

The Bird Grinder helps you filter out spam and offtopic tweets you receive from
Twitter.

Before you can filter the tweets you want you'll need to create a ``Grinder``
and train or "sharpen" it with some tweets that you've marked as "good" and
"bad"::

    from birdgrinder import Grinder

    grinder = Grinder()

    for tweet_json in bad_tweets:
        grinder.sharpen(tweet_json, 'bad')

    for tweet_json in good_tweets:
        grinder.sharpen(tweet_json, 'good')

The ``sharpen()`` method expects two arguments: a tweet (in the raw-JSON form
you received from Twitter) and a category (either ``'good'`` or ``'bad'``).

Feel free to determine the categories of these "training tweets" however you
like. You may want to manually categorize a number of tweets, or implement
some sort of user-powered rating system. It's up to you.

Once you've sharpened your grinder you can use it to filter new tweets::

    for tweet_json in new_tweets:
        result = grinder.grind(tweet_json)

        if result == 'good':
            print 'This tweet is satisfactory.'
        elif result == 'bad':
            print 'This tweet is unacceptable.'
        elif result == 'unknown':
            print 'Not enough information to classify this tweet.'

The ``grind()`` method returns one of three results: ``'good'``, ``'bad'``, or
``'unknown'``.

Saving and Restoring
--------------------

Once you get your grinder nice and sharp you'll probably want to save it so you
can use it again later.

If some data is already saved then creating a new grinder will initialize it
with that data.  To save the data call ``grinder.save()``, and to restore from
the last saved state call ``grinder.restore()``::

    from birdgrinder import Grinder

    grinder = Grinder()

    # ... train ...

    grinder.save()

    # Create a new grinder and restore the saved data automatically.
    new_grinder = Grinder()

    # ... train with bad data ...

    # Throw away the bad data and restore the last saved state.
    new_grinder.restore()

The Bird Grinder can save and restore to and from a number of formats. The
default is JSON data stored in flat files.

Flat Files
''''''''''

JSON data in flat files is used by default because it's supported everywhere.
You can provide a filename when creating your grinder if you like -- the
default is ``./birdgrinder.json``::

    from birdgrinder import Grinder

    # Saves/restores to/from ./birdgrinder.json
    grinder = Grinder()

    # Saves/restores to/from /tmp/birdgrinder.json
    other_grinder = Grinder(filename='/tmp/birdgrinder.json')

Redis
'''''

If you're going to be saving and/or restoring frequently you may want to store
the data in `Redis <http://code.google.com/p/redis/>`_ for better performance.

To use Redis you can pass ``storage='redis'`` when you create your Grinder, and
then use ``grinder.save()`` to save your data as usual. You can pass an
optional ``key_prefix`` argument to specify a custom prefix for the keys -- the
default is ``birdgrinder``::

    from birdgrinder import Grinder

    # Saves/restores to/from birdgrinder:*
    grinder = Grinder(storage='redis')

    # Saves/restores to/from anothergrinder:*
    other_grinder = Grinder(storage='redis', key_prefix='anothergrinder')

If your Redis instance isn't running on localhost on the default port with the
default database number you can change that as well::

    from birdgrinder import Grinder

    grinder = Grinder(storage='redis', host='192.168.0.16', port=7000, db=2)

Advanced Usage
--------------

TODO: Later.

Contributing
------------

To contribute bug fixes, performance improvements or new features just fork the
`BitBucket repository <http://bitbucket.org/sjl/birdgrinder/>`_ or `GitHub
repository <http://github.com/sjl/birdgrinder/>`_ and send a pull request.
author	Steve Losh <steve@stevelosh.com>
date	Tue, 10 Apr 2012 10:19:57 -0400
parents	08b2deff2e09
children	(none)