Why I do love python so much: How it all started

This is something I always wanted to write about. It is on why I picked python as my programming language and why I felt in love with it instantaneously. I’ll break down the post in some parts to avoid making a very extensive one. This first part is on how I began with python.

It all started six years ago at the university, on a subject called syntax and semantics of the programming languages. During one of the classes the professor just made a vague mention about a language that uses indentation to delimit blocks of code. It was called python he said. That was all I needed to start my research and the most amazing journey of my life.

Several days later I started reading tutorials and watching youtube videos about the basics of python. It took me a while to get all the concepts of the language. At that time I didn’t really know much about functional and object oriented programming, I was just starting at the university with procedural programming in languages like Pascal and C.

Once I finally understood the basic data types and the main features of the language they simply blew my mind. They were semantically amazing and the syntax was so clear. Maybe the professor at the university mentioned this language for a reason, I wish he would have gone more deeply in order to enlighten more students. But for me that was just enough.

The years passed and I gained a lot more experience in python and, of course, in coding techniques, paradigms, methodologies, patterns, etc. Nothing drifted me away from my initial path so much to make me change my programming language, on contrary, the more I learn about programming, the more I get attached to python. And I do it in a way that makes me feel I will be using it until I reach the end of my days. I’m truly in love.

In the next post I’ll get more in deep on the technical aspects that I like the most about python. So keep reading 🙂

Filenergy – A simple file sharing tool written in python using flask

Recently I started looking at Flask as a fully-featured framework ready to compete with Django. And I don’t mean it just for simple application that don’t need to access a database. With the amazing Flask-migrate package and thanks to SqlAlchemy and Alembic we are now able to apply migrations on production databases just like we used to do with Django’s South.

The results of my first complete app using Flask and the stack I mentioned above: A simple file sharing tool.

I’ll attach some screenshots here but I think the app just talks for itself. It’s really simple to use it and It can be very useful to share files with other people or just with yourself, as storage service.

You can find a live demo at http://filenergy.crawley-cloud.com/ (In case you want some customization I’d love to work for you making the product suitable to your needs!)

Of course it’s open source and the code is available at https://github.com/jmg/filenergy.

Image

ImageImageImage

Hope you enjoy it!

Django Deployment

I was working on a tool to make django apps really easy to deploy. Just write a config file and run a command specifying the directory of the app. I think it will deserve another post when it’s finished. In the meantime you can take a look at the code on https://github.com/jmg/django_deployment. It’s based on the amazing ssh tool for python, fabric.

Any feedback or contribution would be appreciated!

My last one weekend project of 2012

In the last weekend of 2012 I wanted to build a project before the year ends. So last Friday 28/12 I started coding this new idea, a github repos aggregator for open source projects. Think in it as a hackernews-like site but for projects hosted on github.

It was challenging to build an application in so short time, but recently I’ve participated on many hackathons where the development sprint is about 10/48 hs of non-stop coding. So I’m kind of used to do this stuff now, and I like it!

The good thing about one weekend projects is that you can keep the focus on one thing. Just one thing. As Unix philosophy says, do just one thing but do it really well, I think hackathon’s projects follow the same pattern. In a very short time you have to focus on solving one problem, but in a pretty new and amazing way.

My new idea, hackersprojects aims to solve the fragmentation in the open source community by providing developers a way to share, discover, vote, comment and contribute open source projects. The site is pretty similar to hackers news and the algorithm used to rank trending projects was taken from this analysis on the hacker news ranking algorithm.

The project is completely open source so you can take a look at the repo on https://github.com/jmg/hackersprojects and, why not, send a pull request to contribute with something you may find useful. It will be appreciated. The idea is to make this project grow with the community.

Nothing less to say, just enjoy it! Right now it’s up and running since 9PM 31/12 UTC time. So I guess I accomplished my goal, which was to have a first working implementation before the year ends.

Here is the link: http://www.hackersprojects.com/

Regards and happy new year of coding!

Embedded Chat

Hey guys,

I want to announce the release of a new social embedded chat service. I was working with my colleague on this over the last few months and now we are very happy to make it public as a closed beta service.

You can check it out on our the live site at http://www.embedded-chat.com and give us a try ;-). As we are in private beta this is totally for FREE and you’ll get much benefits when we release the final version if you sign up now.

For the tech people that read this I’ll make another post going deeply on the technologies we are using on this new service. One hint: Python and Node.js can make it really well together ;-). Also you can write all your code in a “pythonic” way if you use coffescript instead of javascript on the node server side. Don’t miss my next post about the geeky side of this project.

Facebook JS SDK for login and python backend api calls with Pyfb.

Sometimes you don’t want to have a redirect from your site to facebook to just perform the login. The solution to this problem is simple. Fortunately facebook provides a login via a popup through the js sdk. The only big problem with this is that you must do api call with javascript right on the client side. This is not the best choice at all. If you don’t take care your application might become very vulnerable.

That’s the reason I’ll you you how to use the js sdk just for login and api calls through python backend code using the library I wrote, Pyfb.

First at all you need to write the index.html where the code to achive the login will be located. It would look like this:

<html>
    <head><title>Facebook Login with JS SDK</title>
    </head>
    <body>
    <div id="fb-root"></div>
    <script>

        function isConnected(response) {
            return response.status == 'connected';
        }

        function getLoginStatus(FB) {

            FB.getLoginStatus(function(response) {

                if (isConnected(response)) {
                    onLogin(response);
                }
                else {
                    FB.login(onLogin);
                }
            });
        }

        function onLogin(response) {

            if (isConnected(response)) {
                location.href = '/facebook_javascript_login_sucess?access_token=' + response.authResponse.accessToken;
            }
        }

        window.fbAsyncInit = function() {

            FB.init({
                appId      : '{{FACEBOOK_APP_ID}}',
                channelUrl : 'http://localhost:8000/media/channel.html',
                status     : true,
                cookie     : true,
                xfbml      : true,
                oauth      : true,
            });

        };

        (function(d){
             var js, id = 'facebook-jssdk'; if (d.getElementById(id)) {return;}
             js = d.createElement('script'); js.id = id; js.async = true;
             js.src = "http://connect.facebook.net/en_US/all.js";
             d.getElementsByTagName('head')[0].appendChild(js);
        }(document));

    </script>

        <button onclick="getLoginStatus(FB)">Facebook Javascript Login</button>
    </body>
</html>

As you can see, in the login callback function (onLogin) you are receiving the access token. This token will allow you to make backend calls, so don’t lose it! I’d recommend to save it in session or store it on the database every time a user do the login.

I will be using django for this example but you could use whatever you want for backend. The views.py django file would looks like this:

from pyfb import Pyfb
from django.http import HttpResponse, HttpResponseRedirect
from django.shortcuts import render_to_response

from settings import FACEBOOK_APP_ID, FACEBOOK_SECRET_KEY

def index(request):
    return render_to_response("index.html", {"FACEBOOK_APP_ID": FACEBOOK_APP_ID})

#Login with the js sdk and backend queries with pyfb
def facebook_javascript_login_sucess(request):

    access_token = request.GET.get("access_token")

    facebook = Pyfb(FACEBOOK_APP_ID)
    facebook.set_access_token(access_token)

    return _render_user(facebook)

def _render_user(facebook):

    me = facebook.get_myself()

    welcome = "Welcome <b>%s</b>. Your Facebook login has been completed successfully!"
    return HttpResponse(welcome % me.name)

Finally just configure the urls.py:

urlpatterns = patterns('',
    (r'^$', 'djangoapp.django_pyfb.views.index'),
    (r'^facebook_javascript_login_sucess/$', 'djangoapp.django_pyfb.views.facebook_javascript_login_sucess'),
)

And don’t forget to have the properly configuration constants on your settings.py:

# Facebook related Settings
FACEBOOK_APP_ID = 'YOUR_APP_ID'
FACEBOOK_SECRET_KEY = 'YOUR_APP_SECRET_CODE'

That’s it! enjoy the facebook graph API!

Proxy Dispatcher implemented in PHP

I want to share a piece of code which might be very usefull when you have to deal with objects introspection in PHP. I played for years with the python’s introspection system and I loved it.

But now I’m back on PHP. A language that have very good metaprogramming tools but which is less pragmatic than python or ruby in this aspects (and maybe in almost all aspects) under my point of view.

In this piece of code I’m trying to replace the *args of python with the php function call_user_func_array. The functionally behind this differents implementations is very similar in the end. But I ever think python’s approach is far better =).

Let the code talk:

/**
* Proxy Dispatcher using php call_user_func_array (http://us2.php.net/manual/en/function.call-user-func-array.php)
* */

class Foo {

    function bar1($arg, $arg2, $arg3, $arg4) {
         return "arg: $arg, arg2: $arg2, arg3: $arg3, arg4: $arg4\n";
    }
    function bar2($arg, $arg2) {
        return "arg: $arg, arg2: $arg2\n";
    }
    function bar3($arg) {
        return "arg: $arg\n";
    }
}

class FooWrapper {

    public function __construct() {
        $this->_foo = new Foo();
    }

    public function __call($method, $arguments) {
        return call_user_func_array(array($this->_foo, $method), $arguments);
    }
}

$fooWrapper = new FooWrapper();
echo $fooWrapper->bar1(1,2,3,4);
echo $fooWrapper->bar2(1,2);
echo $fooWrapper->bar3(1);

And here is the python’s code for the same:

class Foo(object):

    def bar1(self, arg, arg2, arg3, arg4):
        print "arg: %s, arg2: %s, arg3: %s, arg4: %s" % (arg, arg2, arg3, arg4)

    def bar2(self, arg, arg2):
        print "arg: %s, arg2: %s" % (arg, arg2)

    def bar3(self, arg):
        print "arg: %s" % arg


class FooWrapper(object):

    foo = Foo()

    def __getattr__(self, name):
        return lambda *args, **kwargs: getattr(self.foo, name)(*args, **kwargs)


fooWrapper = FooWrapper()
fooWrapper.bar1(1,2,3,4)
fooWrapper.bar2(1,2)
fooWrapper.bar3(1)

Crawley Cloud

I want to share this new web site I’m building. It’s called crawley cloud and will be a crawling and scraping network built on the top of crawley framework.

The original idea of crawley cloud emerged from the lack of a user friendly interfaces to allow every people to search and extract data from the internet. The main goal of this network will be provide the user a bunch of tools (like a customized web browser) in order to make easy the task of searching and extracting data from web sites.

The users will be able to register in this site, download these tools and store theirs projects on the server. Also they would store the extracted data into their accounts and access it whenever they want!

We’re thinking about presenting the extracted data in a real time way too. Wich will provide a more interactive task.

And most importat. It all will be based on an open source framework. So you are able to contribure when you wish!

It’s just the beggining of the project, so if you need to extract data from a specific web site right now you can contact us and we will be glad to help you! Just go to our contact page and send us an email with the details.

Keep reading for updates!

Crawley – A Scraping / Crawling Framework Built On Eventlet

A few weeks ago I started a new project. This is a Crawling / Scraping framework aimed to make easy the way we extract data from the web and store it in a relational database.

Today I released the early version 0.0.4 and I wrote several examples wich explains what the framework can do. I promise to make more real world examples and more documentation in the next days. In the mean time you can follow the project advances on the official repository at github and play with the examples.

You can also download crawley from pip running:

~$ pip install crawley 

and check the documentation.

That’s all for now. Keep watching the repository  =).

Non-blocking I/O, Node Js and Python’s Eventlet

Non-blocking I/O and Node JS

A while ago I researched about Non-blocking I/O. I started with Node Js (An Non-blocking I/O framework built on the google chrome’s JS engine intended to write high scalable networking applications) and I was suprised about how an HTTPServer built with this framework can fast handle a thousand of concurrent requests and do it with a very efficient memory usage.

It can be done because Node Js doesn’t start a new thread or process when a new request come to the server. Everything in Node Js run in a single thread and nothing is blocking. It does asynchronus I/O calls and tells the operating system to notify it back when the I/O task is completed using epoll (Linux), kqueue (FreeBSD), select or whatever your OS provides to do this kind of things. In the meantime, Node Js can continue processing other requests or doing extra stuff. It Never ever blocks.

Another remarkable thing is that you don’t have a particular stack for each connection since you don’t have threads. That’s cause a huge memory save when you have high concurrency levels on your server.

Read more at node official’s page. It’s a very promising project and it’s on the earlier stages.

The issues of Non-blocking Node Js programming model 

An issue related to this model of programming is that your code must be written as a set of callbacks that are invoked when the I/O operation it’s done. To be more explicit, lets look at this example:

var http = require("http") var server = http.createServer(function (req, res) {

    http.get({ 'host' : 'google.com'}, function (google_response) {

        setTimeout(function () {
            res.end(google_response.headers['location'])
        }, 2000)
    })

    res.writeHead(200, {'Content-Type': 'text/plain'})
    res.write("hello ")
})

server.listen(8000)

The code just run a server on localhost at port 8000. When you make a request to http://localhost:8000 it will write “hello”, do an http get request to google, wait for 2 seconds and then print the location header. Note that I write the code using callback functions. Normally in Node Js, almost all your code looks like this.

In addition, you need to write Javascript on the server side. Although, if you don’t like Js you can write CoffeScript for Node instead. If you come from languages like python or ruby you probably like CoffeScript.

Eventlet (The Pythonic Way)

‘Cause I’m a Python enthusiastic and I don’t want to write code the way Node Js proposes I switch my research to eventlet. A python library that provides a synchronous interface to do asynchronus I/O operations.

Green Threads And Coroutines

Eventlet uses green threads to achieve cooperative sockets. Python’s Green threads are built on the top of greenlets, a module of stackless python that implements coroutines for the python languaje. One good thing of green threads is that they are cheap. Spawn a new green thread is much more faster than create a new posix thread and it consumes much less memory too!

Taking advantage of coroutines Eventlet can patch the socket-related modules of the python standard library and make it work with them in order to change the synchronous behaviour to asynchronous behaviour. So it means you don’t need to change your synchronous code to be asynchronous!.

If you want examples of what Eventlet can do read this.

Benchmarking 

Finally I made a little bechmark between A Node Js Server, A WSGI Server using Eventlet and the python HTTPServer of the standard library.

The Node Js Server:

var http = require('http');

http.createServer(function (req, res) {
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello World\n');
    console.log(req.headers['host'] + " - - [" + req.client._idleStart + "] \"" + req.method + " " + req.url + " " + req.httpVersion + "\" " + res.statusCode + " -");
}).listen(6000, "127.0.0.1");

console.log('Server running at http://127.0.0.1:6000/');

The Eventlet Wsgi Server:

from eventlet import wsgi
import eventlet

def handler(env, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return ['Hello, World!\r\n']

wsgi.server(eventlet.listen(('', 7000)), handler)

The Stdlib HTTP Server :

from SocketServer import ThreadingMixIn
from BaseHTTPServer import HTTPServer, BaseHTTPRequestHandler

class Handler(BaseHTTPRequestHandler):

    def do_GET(self):
        self.send_response(200)
        self.send_header("Content-type", "text/plain")
        self.end_headers()
        self.wfile.write('Hello, World!\r\n')

class SimpleHTTPServer(ThreadingMixIn, HTTPServer):
    pass

server = SimpleHTTPServer(("localhost", 8000), Handler)
print "Serving on port: %s" % 8000
server.serve_forever()

Now I have the Node Js server running on port 6000, the Eventlet Wsgi server on port 7000 and the python Http Server on port 8000.

Lets use the linux apache benchmark command to make 10K requests to each server with a concurrency level of 5:

Python Http Server Results:

Server Software:        BaseHTTP/0.3
Server Hostname:        localhost
Server Port:            8000

Document Path:          /
Document Length:        15 bytes

Concurrency Level:      5
Time taken for tests:   8.956 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      1320000 bytes
HTML transferred:       150000 bytes
Requests per second:    1116.51 [#/sec] (mean)
Time per request:       4.478 [ms] (mean)
Time per request:       0.896 [ms] (mean, across all concurrent requests)
Transfer rate:          143.93 [Kbytes/sec] received

Eventlet Wsgi Server Results:

Server Software:
Server Hostname:        localhost
Server Port:            7000

Document Path:          /
Document Length:        15 bytes

Concurrency Level:      5
Time taken for tests:   3.796 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      1360000 bytes
HTML transferred:       150000 bytes
Requests per second:    2634.18 [#/sec] (mean)
Time per request:       1.898 [ms] (mean)
Time per request:       0.380 [ms] (mean, across all concurrent requests)
Transfer rate:          349.85 [Kbytes/sec] received

Node Js Server Results:

Server Software:
Server Hostname:        localhost
Server Port:            6000

Document Path:          /
Document Length:        15 bytes

Concurrency Level:      5
Time taken for tests:   1.821 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      790000 bytes
HTML transferred:       150000 bytes
Requests per second:    5489.98 [#/sec] (mean)
Time per request:       0.911 [ms] (mean)
Time per request:       0.182 [ms] (mean, across all concurrent requests)
Transfer rate:          423.54 [Kbytes/sec] received

Now let increase the concurrency level. Let set it to 100.

Eventlet Wsgi Server Results:

Server Software:
Server Hostname:        localhost
Server Port:            7000

Document Path:          /
Document Length:        15 bytes

Concurrency Level:      100
Time taken for tests:   9.063 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      1360000 bytes
HTML transferred:       150000 bytes
Requests per second:    1103.35 [#/sec] (mean)
Time per request:       90.633 [ms] (mean)
Time per request:       0.906 [ms] (mean, across all concurrent requests)
Transfer rate:          146.54 [Kbytes/sec] received

Node Js Server Results:

Server Software:
Server Hostname:        localhost
Server Port:            6000

Document Path:          /
Document Length:        15 bytes

Concurrency Level:      100
Time taken for tests:   1.463 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Total transferred:      790000 bytes
HTML transferred:       150000 bytes
Requests per second:    6834.49 [#/sec] (mean)
Time per request:       14.632 [ms] (mean)
Time per request:       0.146 [ms] (mean, across all concurrent requests)
Transfer rate:          527.27 [Kbytes/sec] received

Python Http Server Results:

Benchmarking localhost (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
apr_socket_recv: Connection reset by peer (104)
Total of 7830 requests completed

Ups! the server breaks with a broken pipe (I run the test several times and it never completes the 10K requests)

Note: I run the test with a concurrency level of 1K and just Node Js could pass the test. The both python server breaks at one point.

Conclusion

Based on the benchmarks I think there’s no discussions possibility about wich framework have more scalability and is more efficient.

However, if you don’t need to handle a huge quantity of requests concurrently and you want to write your app in pure python I recommend Eventlet instead of the standard sinchronous socket library. The advantages of cheap green threads makes the difference when you need to do concurrent I/O operations. In addition, green threads offers you a deterministic behaviour  and doesn’t have context switch overhead (unlike posix threads and processes). This video shows it better.

A great feature of eventlet is you don’t have to rewrite your code to make it asynchronous. You start with this and learn how to change your application behaviour patching the socket library using eventlet.

Looking fordward

This post was not intended to build an opinion about wich framework or library is better or wich is more efficient or beautifull. It’s just a mind opener article. I shown you a different model to do I/O stuff on networking applications. This’s just the start!. I’ll recommend you to get deep on researchs about this model of I/O. It seems to become stronger in the next years with the advent of real time web applications and comet technologies.

Now it’s time to think about my new project… And by the way, it includes non-blocking I/O, a bunch of networking, and of course, Python =).