Archive for October, 2007

reCaptcha

20071030-non-captcha.jpg

This is awesome, and soon to be a part of our blog sites!

http://recaptcha.net/

They are using scanned words from old books which OCR (optical character recognition or something like that) was unable to correctly decipher. By feeding this to humans, Carnegie Mellon is helping OCR software get better. :-)

Hmmm. I just read through the reCAPTCHA terms of service, and wonder did a lawyer actually write this:

Ownership of Data and Information; Carnegie Mellon’s Use of Personally-Identifiable Information. All data and information generated from the access and use of the Website and Service, including any image solved (whether or not correct) shall be the property of Carnegie Mellon, and no third party, including you, shall have the right to own such data and information, or use such data or information except as expressly authorized by these Terms and Conditions. By using the Website and Service, you automatically assign to Carnegie Mellon any rights in the data or information generated from the access and use of the Website and Service, including any image solved (whether or not correct) of yours and third party users of your website providing interpretations of images (and you agree to make sure that the third party users of your website assign these rights in the data and information to Carnegie Mellon).

That’s one of the more ridiculous fine prints I’ve read in awhile. How in the world could any site owner sanely agree to “make sure that third party users of [their] website assign [those] rights in the data and information to Carnegie Mellon”? That is a practically impossible feat.

Sorry Carnegie Mellon, everybody doesn’t win, especially with wacky terms like the ones you are including in your service.

I believe that the mod_defensible solution described here is a better solution than both Spam Karma 2 and reCAPTCHA. Of course I’ll have to try it out!

More Python Explorations: TinyERP

20071030_funky-lights.jpg

In my work with PBooks, I occasionally do some research on other bookkeeping programs. I was encouraged by a forum member to examine tinyerp, and although I had come across it in the past, I didn’t investigate too deeply as I didn’t know too much about python at the time.

Recently I’ve had some positive experiences with trac, which is written in python, so I decided to take another look. Tinyerp isn’t so tiny, its pretty big and complicated in my opinion, and has a lot of components and dependencies. I was able to install the server easily enough thanks to debian having it in their repository, but the web client is still in active development so that was a manual task.

I learned about the python cheese shop, which I guess is sort of like PHP’s PEAR, perl’s CPAN, and Ruby’s GEMs. With that, I installed some “eggs”: TurboGears, CherryPy, and matplotlib.

After that, I did a little exploring of the python site, the python package index, and read up on what CherryPy is all about. Its a web application server, and it seems pretty cool, but I’m more apt to use Apache and fastcgi, like I do with trac.

From there I looked up some XSLT libraries for python and its nice to know there is a great interface from python to the quality libxml and libxslt libraries, and that those C libraries come with python bindings.

Ubuntu 8 Plans Webapp Deployments?




20071030_fire.jpg

I just took a peek at the Ubuntu 8 blueprints and one of the line items is an easy way to package, install, and customize webapps. If only that were possible!

Apache2 XSLT

20071028-train-1.jpg

I’m trying out the libapache2-modxslt module which is written in c an provides the ability to parse XML. :-)
I just installed it on an ubuntu machine and am going to follow their instructions when trying it out for the first time.

sudo apt-get install libapache2-modxslt
sudo a2enmod modxslt

I had difficulty with their documentation, but I was able to figure out the basics. Here’s what I used in my config:

    <Directory /var/www/public/xslt>
        Order deny,allow
        Deny from all
        Allow from 192.168
        SetOutputFilter mod-xslt
        AddType text/xml .xml
        XSLTSetStylesheet text/xml /var/www/public/test.xsl
    </Directory>

test.xml

<?xml version="1.0"?>
<top>
    <a>blah</a>
</top>

test.xsl

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" indent="yes" encoding="UTF-8" omit-xml-declaration="no" 
doctype-public="-//W3C//DTD HTML 4.01 Transitional//EN"
doctype-system="http://www.w3.org/TR/html4/loose.dtd"/>
<xsl:template match="/">
<html>
<head>
     <title>XSLT Test</title>
</head>
<body>
<div id="container">
<xsl:value-of select="//a"/>
</div>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

In my test, Apache cached the XSL file even though a reboot and modification time change. Hmmm.

Also, documents produced by modxslt generate this comment at the end of the document which is pretty cheesy: