My Geeky Journal

mercoledì 28 settembre 2016

MyBatis + Guice and the Transactional annotation

I wasn't a huge fan of MyBatis, mainly because I hate working with XML. For the project we are working on, we wanted to use the same objects for persistence and to pass as messages to an actor system (Akka) to do some work in parallel, so we really wanted our data objects to be immutable, this removed Hibernate from the picture.
MyBatis turned our to be very reliable and not as painful as I thought when it came to creating the mappers.
As a dependency injection we decided to use Guice, and the integration between the two proved to be seamless. The only problems started showing up when we created some integrations tests to check whether the Transactional annotations were working. The bugs were many, and hard to debug, so I decided to create a dummy project to test how the Transactional annotation works once and for all.
In our project, we are using different databases in the application so we have to use private modules, which introduces a whole bunch of caveats, but before digging into it, let's start with what the internet says about this.

The Transactional annotation documentation

The official Guice wiki doesn't say much about what affect the Transactional annotation, only that:

The only restriction is that these methods must be on objects that were created by Guice and they must not be private.

Fair enough. MyBatis-Guice's documentation ads some more informations about nested transactions, but still doesn't mention issues with private modules. The documentation is very poor for such an important piece of the library

Setting up the environment

For these tests, I created a simple environment: a MyBatis mapper that doesn't really do anything, a service implementing an interface in which I inject the mapper. I also inject the SqlSessionManager in the service to check whether the method is being executed inside a transaction by calling sqlSessionManager.isManagedSessionStarted().

This is the mapper:

This is the service interface:

This is the service:

The tests

The tests simply create an injector, get an instance of the service and call addValue on it. The method will throw an AssertionError if a transaction was not started, indicating that the Transactional annotation was ignored.
Let's ignore the interface for now and focus on the DummyService implementation.
For the first test I just create a normal MyBatis module and bind the DummyMapper. This test passes.

Now, let's try to run a public method calling a private method annotated with Transactional. As the Guice documentation mentioned, this test fails.

Let's add private modules to the picture. I will bind the mapper and expose the mapper in a private service and then try to get an instance of the DummyService. This doesn't work, as the module has to know about the service for the Transactional annotation to work.

To make it work, you need to expose the DummyService explicitely:

Now let's see what happens with interfaces. Let's try binding and exposing the interface instead of the implementation directly, and see if the private module method still works (it does)

So far, everything is working as expected (more or less), but then something weird happened. I had a bug I was trying to fix and I was growing desperate, I tried to explicitely expose Mapper and SqlSessionManager on top of the DummyService, to get closer to our product. I thought it wouldn't make a difference but it did, and this test fails:

Even stranger, it doesn't fail if I bind and expose the implementation directly:

I probably stumbled upon a weird bug, I will update this post if something comes up, but the documentation on this topic is seriously lacking.

This test, farther away from our implementation, reproduces the issue more clearly.

Update

The MyBatis/Guice team promptly got back to me, apparently it's a known Guice issue, but it's "expected behavior". Pretty much what's happening is that Guice is instantiating the service in the private module's parent, and the annotations defined in the private module (including Transactional) are ignored. Suggested workarounds: bind instances directly when they use the Transactional annotation, or use requireExplicitBindings()

mercoledì 20 gennaio 2016

Which framework should I use to develop a REST application in Python?

Recently, I started looking into REST solutions in Python. Although the possibilities are many, it was hard to find many satisfactory products to be used to develop REST applications in a simple, clean way.
To feet my needs, the final product had to satisfy the following criteria:

it has to support Python 3 cleanly;
it should enforce as little design constraints as possible on the resulting application;
as a consequence of the previous point, it should make it easy to swap the chosen solution with another at will;
optional, but good to have: it should make it possible and not too awkward to use a DI library on top of it.

Here is a list of the frameworks I checked out.

Django REST framework

This one seems to be one of the most adopted, probably because of the popularity of Django. Django REST Framework is a Django plugin that adds some functionality to handle REST more easily. It's very well documented and it's not too hard to customize the logic of specific endpoints, or plug in additional code on the resources.

Like with anything in Django, most of the basics work out of the box, and everything is easy and cool as long as you follow the Django (in this case, Django REST framework) way to do things. The design of your applications must follow the rules strictly, and obviously you need to use the entirety of the Django framework, or write some very awkward code.

This makes it almost impossible not only to change the REST layer, but everything else in your application will be entangled in Django.

Flask RESTful

Flask RESTful is more lightweight than Django REST framework, and seems to be well documented and supported. I like the OO approach and the extendibility. The problem with it is Flask: in the documentation, they basically suggest against using Python 3. This makes me very uncomfortable using the framework in general, let alone an extension on top of it.

Another thing I don't like is the fact that you need to extend Resource to create a resource, and you need to pass the class arguments as dictionaries using resource_class_kwargs and resource_class_args: it makes it awkward to inject services and to use a DI library.

Werkzeug

Werkzeug is the WSGI library Flask is based upon. It's a very thin layer, so it doesn't force your design too much. The only complain may be that the layer is almost too thin: it doesn't do much to hide WSGI. Another big point against it is that, according to their website, support for python 3 is still highly experimental.

Falcon

Falcon ended up being my framework of choice. I love its minimalistic approach, and the fact that it uses duck typing when creating resources, and that it takes initialized classes when configuring the resulting application: it makes it very easy to use DI with it, encouraging the developers to follow the single responsibility principle. It's also advertised as being extremely fast compared to other frameworks, although I'm not really concerned with performance at this level when coding with Python. And it supports Python 3!

I'll publish a post with my approach to REST development with Falcon in the near future.

Conclusion

Obviously, this post is not an evaluation on the quality of these frameworks per se, but more a collection of thoughts on which library fits my needs the most. I'd like to get your feedback and feel free to suggest more frameworks.

martedì 4 dicembre 2012

Python: performance comparison of itertools dropwhile and takewhile against simple generators

While wandering around on the internet, i stumbled upon this thread, in which is discussed whether or not dropwhile and takewhile should be deprecated, and later removed, from the itertools library. As for Hettinger, who wrote itertools, the use of dropwhile and takewhile lead to less readable code, as everything they are used for can be implemented using generators. While reading, I thought 'ok, maybe they are less readable, but shouldn't be the purpose of itertools to provide a toolbox to perform loops in an efficient way using mainly pure c code? Even if generators are more readable, which is debatable, there's no way that generators can be faster than pure c code'. As in the discussion efficiency wasn't mentioned, i decided to profile generators against dropwhile and takewhile myself.

In the discussion mentioned above, the following use case is used to show the use of dropwhile/takewhile: iterate over text delimited by start and end markers. Here is the timing code:

The result was quite surprising: generators, in this case, are faster than dropwhile and takewhile by something around 30-40%. I tested the code with both python 2.7.3 and python 3.3.0 with similar results (python 3.3.0 being slower for both functions).

import timeit
import dis 
import random
from itertools import dropwhile,takewhile
FILE_PATH = "test_data/data/text_with_start_end_markers.txt"
FILE = [line for line in open(FILE_PATH)]
START_MARKER = 'start_marker'
END_MARKER = 'end_marker'

def iter_block_generator(lines, start_marker, end_marker):
  lines = iter(lines)
  for line in lines:
    if line.startswith(start_marker):
      yield line
      break
  for line in lines:
    if line.startswith(end_marker):
      return
    yield line

def iter_block_itertools(lines, start_marker, end_marker):
  return takewhile(lambda x: not x.startswith(end_marker),
                   dropwhile(lambda x: not x.startswith(start_marker),
                             lines)
                  )


print("check that both solutions return the same result:")
join_using_itertools = \
        "".join(iter_block_itertools(FILE,
                                     START_MARKER,
                                     END_MARKER
                                    )
                )
join_using_generator = \
        "".join(iter_block_generator(FILE,
                                     START_MARKER,
                                     END_MARKER
                                     )
                )
assert join_using_itertools == join_using_generator
iter_block_generator_func = \
        "''.join(iter_block_generator(FILE," + \
                                     "START_MARKER," + \
                                     "END_MARKER))"
  
iter_block_itertools_func = \
        "''.join(iter_block_itertools(FILE," + \
                                     "START_MARKER," + \
                                     "END_MARKER))"


for function in (iter_block_generator_func,iter_block_itertools_func):
 print(function)
 print(timeit.repeat(
                function,
                repeat=1,
                number=5,
                setup="from __main__ import " + \
                      "iter_block_generator," + \
                      "iter_block_itertools," + \
                      "FILE,START_MARKER,END_MARKER"
                )
       )

I'm probably missing something, so let me know if you find a good reason for this.

venerdì 25 novembre 2011

Wordpress get_terms with parent id returns an empty array

Stumbled on what looks like some sort of bug (probably, of my own code). I spent the last hour trying to understand why this code:

$subterms = get_terms("customtax",
   array(
     "hide_empty"=>0,
     'parent' => $parent_term->term_id,
   )
  );

returned and empty array, while i was sure my custom taxonomy had lots of children.
Digging in wordpress code i found that get_terms calls a function called "_get_term_hierarchy".

This function uses some sort of caching system. It looks for an option named "customtax_children" on wordpress' database, and populates it if it's empty.

On my database, that option had an empty array as value, represented by a string like this one: "a:0:{}".

All i had to do was to delete that option in order to force WP to repopulate the cache:

delete from wp_option where option_name='customtax_children';

I hope this helps somebody.

mercoledì 26 ottobre 2011

Gaussian filter python implementation

This post is, hopefully, a part of a bigger tutorial about edge detection. My final goal is to implement a Canny edge detector in python, it's just an exc
ercise to get a better understanding about the matter.

The first step in Canny algorithm is to apply a gaussian filter to the image, in order to get rid of some noise that will make edge detection harder.

I used this guide as a reference.

One-dimensional window

Here is the algorithm that applies the gaussian filter to a one dimentional list. The first step is to calculate wiindow weights, than, for every element in the list, we'll place the window over it, multiply the elements by their corresponding weight and then sum them up.


def get_window_weights(N):
    support_points = [(float(3 * i)/float(N))**2.0 for i in range(-N,N + 1)]
    gii_factors = [exp(-(i/2.0)) for i in support_points]
    ki = float(sum(gii_factors))
    return [giin/ki for giin in gii_factors]

def apply_filter(index,array,window):
    N = (len(window)-1)/2
    #fix out of range exception
    array_l = [array[0] for i in range(N)] + array + [array[-1] for i in range(N)]
    return sum(
            (float(array_l[N + index + i]) * window[N+i]
            for i in range(-N,N+1)
            )
            )

def gaussian_filter(data,window_weights,filter_func = apply_filter):
    ret = []
    for i in range(len(data)):
        ret.append(filter_func(i,data,window_weights))
    return ret

In order to apply the filter to images, we need a function that can work on pixels. The basic idea is to execute the same operations on every component of the color.


def sum_filtered_pixels(pix1,pix2):
    return tuple([pix1[i]+pix2[i] for i in range(len(pix1))])

def apply_filter_to_pixel(index,array,window):
    N = (len(window)-1)/2
    #fix out of range exception
    array_l = [array[0] for i in range(N)] + array + [array[-1] for i in range(N)]
    return reduce(sum_filtered_pixels,
            ( tuple([float(v) * window[N+i] for v in array_l[N + index + i]])
            for i in range(-N,N+1)
            )
            )

Bidimensional window

Obviously, we need to apply the filter to an image, so we need it to work with a bidimensional window. As explained in the guide, we can use a divide-et-impera approach and use the one dimensional algorithm. All we have to do is to run 1d gauss filter over all the pixel lines, and than again over all the columns.


def gaussian_filter_2d(matrix,window_weights,filter_func = apply_filter):
    new_matrix = []
    for i in range(len(matrix)):
        new_matrix.append(gaussian_filter(matrix[i],window_weights,filter_func))   
    #apply 1d gaussian filter line by line
    for i in range(len(matrix[0])):
        temp_list = gaussian_filter([new_matrix[t][i] for t in range(len(matrix))],
                                    window_weights,filter_func)
        for t in range(len(matrix)):
            new_matrix[t][i] = temp_list[t]
    return new_matrix

def gaussian_blur(img_in,img_out,window_size):
    img = Image.open(img_in)
    width,height = img.size
    window_weights = get_window_weights(window_size)
    pixmap = gaussian_filter_2d([
                                 [img.getpixel((w,h)) for w in range(width)]
                                 for h in range(height)
                                ],
                                window_weights,
                                apply_filter_to_pixel)

    new_image = Image.new("RGB",(width,height))
    for h in range(height):
        for w in range(width):
            new_image.putpixel(
                          (w,h),(int(pixmap[h][w][0]),
                                int(pixmap[h][w][1]),
                                 int(pixmap[h][w][2]))
                        )
    new_image.save(img_out)

if __name__ == '__main__':
    gaussian_blur('wombat.jpg',"wombat_blurred.jpg",5)

Here is the result:

lunedì 24 ottobre 2011

HowTo: avoid 'Complete surveys to unblock the website

Just a stupid hint.

Whenever you find yourself in front of a page that blocks asking you to complete a survey in order to continue navigating, and you're looking for a simple link on the blocked page ( say a megavideo link) just do this:

Go to the page source (CTRL + u on Google Chrome);
search the link on the source (CTFL + f on Google Chrome), using some keyword (for example "megavideo.com";
copy/paste the link (inside <a href="***" )
enjoy.

giovedì 11 agosto 2011

Simple way to solve clang import errors on Ubuntu 11.04

If clang give you errors such as:

/usr/include/linux/errno.h:4:10: fatal error: 'asm/errno.h' file not found

just manually set the right path in order for it to find the necessary dependencies. To me (Kubuntu 11.04 64 ) it worked with:

clang -I/usr/include/x86_64-linux-gnu/