Organizing Media files on Linux

Recently I was doing some cleanup on my Dropbox account, and I found that some folders (especially “Camera Uploads”) had like a million unorganized images and video files that were auto uploaded from my mobile phone and other devices. Unfortunately Dropbox doesn’t automatically organize these files in date folders or even gives you an option to do so. So I decided to write a script to organize the files myself in the Dropbox folder on my Linux Desktop. Below is a simple shell script that can do this.

This script moves the media files in date folders, which are created using the last modified date of the file. The script is also available on Github and uses exiftool to determine the last modified date. The exiftool can be installed on Ubuntu using the following command:

Clustering and Load balancing Tomcat servers

Tomcat is by far the most popular open source Java servlet container that can easily be scaled to ensure high availability. In this post I will show how easy it is to distribute and scale java web applications using tomcat clusters. In this example we will also use Apache web server to load balance the cluster.

Below is a simple setup of a cluster with two Tomcat 7 nodes with Apache web server 2.2 as a load balancer.

tccluster

Clustering

Load balanced clustering on tomcat is done in four simple steps:

1- Uniquely naming each node
2- Configure clustering in server.xml
3- Configure the session persistence in context.xml (Optional)
4- Enable distribution in the Java web application

The following configuration is done on each tomcat instance either running on the same machine or on separate virtual or physical servers. If multiple instances are running on the same machine then it should be ensured that each instance is listening on different ports.

Naming the tomcat nodes

The server node can be named by adding the jvmRoute element to the tomcat Engine as shown below. This node name is appended to the session id.

Configuring the cluster

Clustering can easily be enabled in tomcat by adding the following to the server.xml file

[turbo_widget widget-prefix=text&obj-class=WP_Widget_Text&widget-text–title=&widget-text–text=%3C!–+Post+Ad+–%3E%0A%3Cins+class%3D%22adsbygoogle%22%0A+++++style%3D%22display%3Ablock%22%0A+++++data-ad-client%3D%22ca-pub-4433761869744390%22%0A+++++data-ad-slot%3D%229020480262%22%0A+++++data-ad-format%3D%22auto%22%3E%3C%2Fins%3E%0A%3Cscript%3E%0A(adsbygoogle+%3D+window.adsbygoogle+%7C%7C+%5B%5D).push(%7B%7D)%3B%0A%3C%2Fscript%3E&widget-text–filter=false]

This will also enable session replication on all tomcat nodes. Below is a detailed example that shows more options, including multicast and deployer configuration. See the tomcat documentation for more info.

Configuring session persistence (Optional)

Session persistence in configured in the context.xml file by defining a persistence manager. By default the session is persisted in-memory but optionally a disk or database storage can be also be configured. The example below shows how the session is persisted on the local disk using a session store.

Making the application distributable

The java web applications can be configured to run in a cluster by adding the following to application’s web.xml file.

Once all this configuration is done on each node, the servers will be running in a cluster with session replication enabled.

[turbo_widget widget-prefix=text&obj-class=WP_Widget_Text&widget-text–title=&widget-text–text=%3C!–+Post+Ad+–%3E%0A%3Cins+class%3D%22adsbygoogle%22%0A+++++style%3D%22display%3Ablock%22%0A+++++data-ad-client%3D%22ca-pub-4433761869744390%22%0A+++++data-ad-slot%3D%229020480262%22%0A+++++data-ad-format%3D%22auto%22%3E%3C%2Fins%3E%0A%3Cscript%3E%0A(adsbygoogle+%3D+window.adsbygoogle+%7C%7C+%5B%5D).push(%7B%7D)%3B%0A%3C%2Fscript%3E&widget-text–filter=false]

Load balancing the cluster

Now the next step is to configure load balancing to effectively use the scalable tomcat cluster and to ensure high availability. Meaning if one of the nodes, in the cluster, goes down, the application will still be available on the other node. And with session replication enabled the service and user experience will not be impacted.

In this example I am using Apache’s mod_proxy_ajp to load balance the two tomcat instances. The configuration is fairly simple and is added to the httpd.conf file. The example also shows how session sticky-ness is enabled. The loadfactor defines the weighted load that is applied on each node. The lbmethod configures the load balancing strategy, which can either be byrequests or bytraffic.

Note: mod_proxy_ajp is not the only way apache is used to load balance tomcat instances, the other commonly used methods are mod_jk and mod_proxy.

Demise of J2ME

Since the arrival of iOS and Android smartphones, J2ME has really faded into the background. And it is not because iOS and Android are better platforms, it is because J2ME didn’t evolve with the rapid technological growth of mobile devices. J2ME was originally intended for low-end devices, as a lightweight Java runtime and was very successful. But we haven’t seen any new enhancements to the platform for the commonly available new high-end mobile devices nowadays. There has been some maintenance releases here and there but no substantial improvements were seen in the platform. Some proposed enhancements are lying dormant in the neo-political and ever bureaucratic JCP. And since Oracle acquired Sun Microsystems, and started pushing their ADF platform for corporate mobile solutions, the future of J2ME is looking very bleak.

Noone has anything interesting to say about J2ME anymore and their market-share is gradually decreasing. It is hard to understand why J2ME has been orphaned, which once used to be the only real mobile platform supported by majority of phone vendors. The market did show some positive vital signs for J2ME, when the sales of iOS and Android went down last year, which had some people excited; it also showed that the platform might still have potential. But Oracle, for some strange reason, failed to capitalize on it.

Oracle may brag about millions of devices running Java, but the truth is Oracle doesn’t have a proper platform for modern mobile devices. Realistically speaking, today, J2ME cannot compete with the likes of iOS, and it’s cousin Android, which provides, although non-standard and controversial, but a full featured Java 5 runtime for mobile along with many open source goodies from Apache. From the end-user perspective, J2ME does not give you the same feature-rich user experience as Android, iOS or even Blackberry. The J2ME developers also never saw a standard app-store, available on all mobile devices, where they could sell their apps, which is one of the major success attributes of iOS and Android. And with the success of cross-platform HTML5/AJAX based technologies, it will be pretty hard for J2ME to compete unless it brings something new to table, which doesn’t seem likely with all these years of EDS.

It would have been nice to see a standards-based Java Mobile platform backed by the JCP. Android did come close to becoming the next J2ME, which was also a good news for Java being Open, but the success was marred by the Oracle law-suits. In the end, as a developer it is sad to witness the gradual but imminent demise of J2ME. So R.I.P J2ME, you had your day and it was nice knowing you.

Sharing VPN connection on Linux

Most VPN servers allow a single remote session per user, which is all you need most of the times. But sometimes it is necessary to connect multiple devices to the VPN server; but using a single user account it is impossible if the server doesn’t allow it. There is a way around this problem by sharing the VPN connection from a central node to other computers by setting up an ad-hoc wireless network using the wireless modem of the central computer as a hot-spot. The idea is fairly simple provided the central computer has two network cards:

  1. Use a central computer to connect to VPN via ethernet or one of the network cards
  2. Setup a hotspot on the central computer so that devices in range can connect to it over wifi
  3. Route all traffic (inbound & outbound) from the hotspot to the ethernet/vpn connection
The diagram below illustrates this.
vpn_share

So how do we do this? Below is an example to setup this configuration on a Linux box. I used Linux Mint desktop in this example. Here are the steps:

  1. Install and configure hostapd application so that you can turn your wireless modem into a hotspot
  2. Install and configure a DHCP server so that IP addresses are assigned to devices connected to the hotspot
  3. Allow IP masquerading to share the ethernet/vpn connection with the devices connected to the hotspot.

Install and configure hostapd

Use the following command to install the hostapd application

Configure hostapd by editing the /etc/hostapd/hostapd.conf file as follows

You can check the wireless interface name by using the iwconfig command, on my machine the interface name was wlan0. Now you can start hostapd using the following command:

Install and configure dhcp

Install the dhcp server using the following command

Edit the /etc/dhcp/dhcpd.conf file to setup subnet by adding the following lines to the file

Edit /etc/default/isc-dhcp-server and add the wireless network interface name like below:

Configure a new interface and start the dhcp server

Allow IP masquerading

Now when the linux box is connected to the VPN, we can share this VPN connection over wifi hotspot by running following commands:

In this example the vpn interface is tun0, you can check the interface name using iwconfig command.

So now VPN sharing is setup and all your devices (computers, tablets, smart phones etc.), connected to the hot-spot of your central linux box, can access all the available network resources on VPN.

Near Field Communication (NFC)

Near Field Communication (NFC) is a wireless/radio-frequency technology that works over short-range. This involves communicating data between devices(“initiator” and “target”) in close proximty normally requiring less than 10 centimeters. In the realm of mobile phones, the “initiator” is the mobile handset and the target is typically a RFID(Radio Frequency Identification) Tag for passive communication. To understand it; RFID Tags, for passive communication can be thought as QR/barcodes and the smartphone as a “reader”. But with NFC it is also possible to have an “active” communication, which requires powered RFID Tags enabling peer-to-peer communication between itself and the NFC-enabled device, similar to Bluetooth.

Now this technology opens up a realm of possibilities, from reading messages from RFID Smart-Posters, using the mobile phone as keys, to mobile payment services, in which your smart phone becomes your smart wallet making your phone serve as a credit card or, depending on the payment model, charge a credit card with an embedded RFID Tag. This technology is still new but is being widely adopted by smart phone hardware & software vendors; and Android 2.3 has already provided a high-level API to write NFC applications. Google’s Nexus-S is now available in most countries, with built-in NFC support, which enables Nexus-S to read RFID Tags; iPhone 5 & iPad 2 will also be launched with NFC chips. And soon as this technology is well understood, fully standardized with the security issues resolved, we will see it being adopted by more and more companies and financial institutions.

A free software license

There are many free software licenses available these days, from viral and stringent licenses like GPL, with some grey and confusing like LGPL to business friendly licenses like ASL and X11. All of these licenses are “open source”, which means the software source is also available to public along with its binaries. Which is great for open source and I personally like ASL among the lot and tend to use it for most of my open source software. But sometimes you might feel that these licenses are not “free” enough, and pack tons of terms and conditions in a pretty complicated language. There should be a software specific license which doesn’t restrict the user in anyway and allows to freely use, modify and distribute the software in any way the user likes. And the derived work should be allowed for both commercial and personal use, with or without a charge.

Now with the above situation in mind and as a developer of free software, here is my attempt to create a re-usable free software license. Lets call it, for obvious reasons, The Free Software License (TFSL). I must also mention that this license is inspired from GPL, ASL and WTF licenses and I have borrowed a few things from them. This license is also DRM-free and is composed of the following three important things:

  1. Terms of use
  2. Warranty
  3. Liability

I think these three things explain fully how the software should be used and also protects the developer/vendor from any damages inured from the use of the software. This license also makes sure that the software, licensed under TFSL, is distributed for free.

THE FREE SOFTWARE LICENSE
Version 1, April 2011

Copyright (C) 2011 Kamran Zafar <kamran@kamranzafar.org>

Everyone is permitted to copy and distribute verbatim or modified copies of this license document, and changing is allowed as long as the name is changed.

THE FREE SOFTWARE LICENSE
TERMS AND CONDITIONS FOR USE, MODIFICATION, REPRODUCTION AND DISTRIBUTION

0. You can use, modify, reproduce and distribute the software and any of its components in any way you want.
1. The software is available on “AS IS” basis and comes with absolutely NO warranty, either express or implied, to the extent permitted by applicable law.
2. In any event and under any legal theory, the copyright owner of the software will NOT be liable for any direct or in-direct damages incurred by the existence or the use of the software.

This “free software license” is available for free and can be used by any software vendor to distribute software as long as it meets the terms and conditions outlined in it’s text. In order to apply this license to the software, each source file of the software MUST contain the following wording.

Copyright [year] [name of copyright owner]

This program is a free software and comes with absolutely no warranty. You can use, modify, reproduce and distribute this software under the terms and conditions of The Free Software License.

You can obtain the copy of The Free Software License at

http://kamranzafar.org/licenses/tfsl.txt

Now like I said, it is just an attempt to create such a free software license and I am sure this is not perfect but it can be modified as per vendor requirements. This license is also available for download here.

Why is git better than svn

Well I am not going to start debating over this question, as most people on the web are doing so. I am not going to list pros and cons of both svn and git and make a vague conclusion in the end. I being a user of both svn and git, found git better for the kind of work I do and the way I do it. I was mainly a svn user before I heard people talking about git being easier etc. than svn; after I started using git I agree with these people. With git, it is much easier, simpler, faster and natural(in the realm of development) to manage your code, to create and clone repos; branch, tag and merge code. Apart from being simple and powerful, git is also decentralized and distributed in nature, which undoubtedly is git’s main advantage over svn.

Now I found git better because I no-longer have to think about problems and limitations associated with svnsync; with git I can clone my repositories, pull and push my changes from anywhere in the world. In fact git allows ubiquitous transactions, so one doesn’t have to be connected to the main repo (as in svn) to check in changes. One also doesn’t have to remember branch or tag locations like with svn. I sometimes work while I am traveling, when I am in a different city or country; so now, with git, I don’t have to care from where in the world I am pushing files to which mirror of which repository; in other words I won’t indulge myself in the master-slave repo setups like in svn, in git every clone of a repo is truly “equal”, master and slave at the same time. Security is one area where git needs to focus on more, because it is decentralized; but the current asymmetric encryption for all transactions seem to suffice.

There are many other things one can talk about, there are pros and cons of both SCMs. But in the end I would say that gradually people should try and move away from a centralized repo to a more distributed SCM because of many advantages, some of which are described above.

Crawl, index and search

Sometimes you need to search for files or pages on content-rich websites or browser-based information software/encyclopedias that don’t really have a search functionality. And it could be a pain to find what you are looking for. I wrote a little crawler once in python that works well to search for stuff, on websites, on the fly. But sometimes a real “index” is needed for searching. There are a few libraries available and among them is the open source Apache Lucene, an excellent high-performance text search engine library that one can use for free. Lucene coupled with a multi-threaded web crawler and you have a pretty good index and search functionality; though not as good as google, but close.

Below is an example of how you can use Lucene to build searchable indexes for websites.

Now once the index is created we can start searching it for content. Lucene provides IndexSearcher class that is used to search the index using a Query. Below is an example that searches for results in the above created index, and prints the website URL where the required content is found.

The full source of this example, including the web crawler, can be found here and is available under GPL.

Project migration from Sourceforge to Googlecode

I have been using googlecode for some of my recent open source development work, and I was surprised how googlecode speeds up development. The SCM is very fast and gave me no troubles, it is easy to create wiki pages and documentation for projects etc. etc. Although it offers limited features, compared to sourceforge for example, but the real power is in its simplicity. Sourceforge offers more features like, hosting web pages and shell services and if you are smart you can also create your own little maven repository for your artifacts; one might argue that all these features make sourceforge very complex. But recently sourceforge has become slow as hell, and it is bit of a pain to manage your work, SCM is slow, web pages are not served with a desired speed, shell services (although more secure) but slower and the whole shell-creation process takes too long. So to cut the story short, I finally decided to migrate some of my work from sourceforge to googlecode, simply because googlecode is faster and simple.

In the beginning I had no clue how to achieve this task. But it was much simpler than I anticipated. My only concern was to get the code migrated fully, safely and with all the version history. This is done by synch’ing the project’s SVN repository on googlecode with the repository on sourceforge. First I reset the googlecode repository to enable svn synch’ing. This is done under Administrator->Source tab on your project’s homepage on googlecode. Then I began the synch’ing process.

The first step is to initialize the googlecode’s subversion repository

After this we just start synch’ing the repositories.

The above command will fetch all the code, with the version history and including tags and branches.

And this is all to it. For more information on the svnsync refer to the subversion redbook.

Oracle-IBM pact and Android

IBM-Oracle pact is a good news for Java developers and for the open source community in general. OpenJDK is a more natural open alternative to Oracle J2SE and is well backed by the Java community, so IBM’s move to shift “its development effort from the Apache project Harmony to OpenJDK” makes sense. And remember IBM also wanted to acquire Sun mainly because of Java. Apache Harmony on the other hand never gained enough popularity because of the TCK-issuance tussle between Sun and Apache. But Harmony remains as another open-source implementation of Java, free from legal infringements.

Now a lot of people see this pact as a threat to Google’s Android platform, and associate it with Oracle’s lawsuit against Google, but I don’t think this is the case (although Oracle may think otherwise). First of all OpenJDK is not built for mobile platforms, it could be a threat to Oracle’s own J2SE but not to Android. Secondly Google uses Apache Harmony, which has been rewritten and is Open. Google also uses a Dalvik VM, which is a special virtual machine written from scratch for mobile devices, and is also backed by the Open Handset Alliance. Dalvik VM is also open-source and uses it’s own form of bytecode, which is “distinct and different from Java bytecode“. So Google is not using any Java component in their Android platform, which would obligate them to require a license from Oracle.

As a developer, I think IBM’s move is good for Java being Open. To Google it is kind of an opportunity to start owning Harmony and to control development on it; and Google should seriously start investing in Harmony and capitalize on this opportunity. And for the lawsuit, I think Google can win it, if it is not thrown out of the court, may be fined for “some incompatibilities” in their version of Java. But it is the time this suit is going to take, “may” cause some hardware vendors to slowdown their production of android phones, but the demand will drive it and I don’t see it happening because Android phones are much cheaper; a lot of android phones are lined up for 2011 with Android-3-based tablets.

Oracle, a patent troll, just wants money and will try to prolong the lawsuit as much as they can to get some leverage from “time” and “doubts” about future of android. But in the end it is not going to matter much because Oracle is not Android’s competitor and doesn’t have a “real” mobile platform except for the dying-j2me, which they inherited from Sun.