Sometimes there are only images in a PDF. In such cases you can not select text to copy / paste or just for reference. To extract text from an Image or a PDF containing only images, I used Tesseract OCR Engine and Ghostscript. I am running Fedora 19 at the moment, however these steps should apply to an older version of Fedora or Ubuntu. ( I believe this can be done on Windows as well ).

Read more →

I had attended Cryptoparty (Delhi) a couple of weeks back, where I came to know about some some really nice addons for Firefox. I am listing them below: HTTPS Everywhere Ad Block Edge Disconnect Collusion I hope you find these tools useful. Now you can discover how much your clicks and visits to various sites are being recorded. Enjoy! References:

Read more →

I had to use Juniper VPN but apparently it doesn’t work well from Linux machines. To complicate the matter even more the underlying VPN software is 32 bit, but I am running a 64bit OS. I have outlined the setup steps here: Juniper VPN setup on Fedora 17 x86_64: https://gist.github.com/tuxdna/5926030 Although all these steps are specifically for Fedora 17/18 x86_64 bit OS, it should be easy to connect from Ubuntu machines too.

Read more →

I the only source of Internet connection I have currently is my phone. I wanted to share this network with other systems, via a lan/wireless router. So here is a basic setup: Android Phone with USB Tethering Laptop ( Fedora 18 ) Wireless Router with Lan A desktop machine ( Fedora 17 ) Setting up Internet gateway: Laptop + Android Phone I connected Android Phone via USB cable to the laptop and enabled USB thether, so I have a usb device /dev/usb0.

Read more →

I was wondering if I could get a sample out of many huge Lucene indexes and inspect them with Lukeall on my machine. I quickly realized, that copying such indexes over network would be time consuming. First I googled for a ready-made solution so that I could copy on a few documents from the whole index into a separate ( small ) index. That way I could quickly understand the document structure.

Read more →

I was wondering whether or not to learn Play2 Framework. I already know Ruby On Rails. The main criteria right now is to run a web-application that will use lots of code already written in Java. Which one of Rails/JRuby or Play2 would be better when deploying on JVM? That is my main question. So far I have stumbled upon similar questions as below: Play vs JRuby on Rails ??advantages of play framework for people coming from php / ruby / pythonThere are two criteria that I considered for selecting Play2:

Read more →

I was trying The first Akka Example here Install the TypeSafe stack on RPM distro: $ sudo yum install typesafe-stack Setup a g8 project: $ g8 typesafehub/akka-scala-sbt Akka 2.0 Project Using Scala and sbt organization [org.example]: in.tuxdna name [Akka Project In Scala]: akka-pi akka_version [2.0.1]: version [0.1-SNAPSHOT]: The I added the Pi.scala to src/main/scala folder. Then I tried to compile it from within Emacs, to check if it will be easier navigate compilation errors in the code.

Read more →

Students at JMI University had organized JDevDay ( formerly CONF@JMI). This was a two day conference on FOSS topics. The agenda is listed here: https://jdevday.wordpress.com/2013/03/05/talks-in-jdevday/ I spoke on «Introduction to Scala», the slides of which are uploaded to slideshare and on github. It was so nice to meet all the speakers and attendees. I took a few photos which I have uploaded on flickr. JDevDay Rocks!

Read more →

Last time I created a presentation in Org Mode was in 2010 here. Creating presentation in Emacs Org Mode and publishing it as Latex, PDF and HTML, is so awesome. Here are the links to get you started: http://orgmode.org/worg/exporters/beamer/tutorial.html http://orgmode.org/worg/org-tutorials/non-beamer-presentations.html And follow these simple steps ( I am using Ubuntu right now but the steps should be easy on Fedora too.): Create a slides.org file as in the tutorial above:

Read more →

Where the heck did agave go from Fedora 17 repos? Yes I am asking the same question already been asked in fedora forum [1]. Apparently it doesn’t [2] have any maintainer hence no build for Fedora 17. [1] http://forums.fedoraforum.org/showthread.php?t=276162 [2] https://admin.fedoraproject.org/pkgdb/acls/name/agave [3] http://home.gna.org/colorscheme/

Read more →

I had used RSpec earlier for Behaviour Development Development for a Ruby on Rails project. Today I learnt how to do BDD in Scala. Chapter 4 of Programming Scala introduces Traits and Specs for testing the code. Here is how I set it up: I already had Scala installed. So the first step was to setup sbt ( Simple Build Tool for Scala ). Setup was easy ( described in detail here ):

Read more →

Apache Mesos, written primarily in C++ is a cluster resource allocation framework. It is used by the Spark Project ( majority of Spark project is written in Scala ). Apache Mesos - http://incubator.apache.org/mesos/ Spark Project - http://spark-project.org/ Tech Talk: Matei Zaharia (UC Berkeley) – «Spark: A Framework for Iterative and Interactive Cluster Computing» - http://vimeo.com/20757432

Read more →

I am running Scala on Fedora 17 and I seem to face a problem. I created a /tmp/hello.scala file, as follows: println("Hello world") This works just fine if I run it on Ubuntu 12.04 i.e. it prints Hello world. However, when I try to run it on Fedora, I don’t get any output and the script keeps waiting forever. So I tried to inspect what’s happening here: $ bash -x scala /tmp/hello.

Read more →

Obviously the first step is to install Scala language. $ sudo aptitude install scala OR $ yum install scala Then I ran my first Scala «Hello world!» program from CLI. Setting up Scala mode for Emacs was a bit of a pain so I merged the old scala-mode and the latest into my repo. Here are very simple steps to setup scala-mode for Emacs. $ cd ~/.emacs.d/ $ git clone git://github.

Read more →

I was reading a few blog posts about distributed, large-scale processing of data, be it in batch or real-time. And definitely the move is towards real-time now. ( hereĀ and here ) . Well, in this blog post I am only going to mention about the things that I have come across so far. I would like to learn more. All the buzz around large scale data processing, in some way or the other, seems to be inspired by papers published by Google or the systems they built.

Read more →

Indexing the documents stored in a database Outline: Setup a MySQL database [1] containing documents( PDF/DOC/HTML etc ). Setup Apache Solr / Tika Import the documents just by hitting an import url. NOTE: Also check the update note at the end of this post. These steps were done on my machine running Fedora 17. The commands be easliy converted for other distributions. Setup MySQL database with documents

Read more →

For sometime now I have been working on a project called JCallTracer. I had a simple problem at hand: generate Squence Diagrams for a program written in Java. I did try to google such a tool but I couldn’t find anything that was Open Source and worked on Linux. The closest I could find was Java Call Tracer. This tool was designed for Windows users and didn’t compile on Linux. I fixed that, but then it was apparently designed for Java programs with small memory foot-print.

Read more →

Today I got to know that C has a new standard released in 2011. You can find a detailed Dr. Dobbs’s article on the subject. So far I havent come across any Open Source compiler that fully implements C11 features. Clang and GCC are yet to fully support this standard. Clang has added support for anonymous structs and anonymous unions: Clang 3.1 adds support for anonymous structs and anonymous unions, added in the latest ISO C standard.

Read more →

I was going through a list of Apache Incubator projects and I found a few really interesting projects, primarily because I could immediately relate them to some functionality I could readily use. However, I have to say that the layout on the Apache Incubator projects makes it a daunting task to visit each and every project link to know the technology or domain a project name could be relate to. If instead of a project name matrix, there was a simple project list with Project Name, Technologies, Domain etc.

Read more →

I wanted to know how much memory is consumed by C++ standard library for a process running on Linux. There is no straightforward way I could find so I have written a small script to do exactly that. Script Location: https://gist.github.com/4215536 How to use? $ wget https://raw.github.com/gist/4215536/6ae899f454fd72ba3b6202724e15f855f80e33b3/mem-usage.rb $ ruby ./mem-usage.rb /proc/5952/maps | grep libstd /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16: 2988.0 KBs In the above example, 5952 is the PID of Thunderbird mail client and C++ standard library consumes 2988 KB of memory for this process.

Read more →