Machine Learning with Ruby

Machine Learning with
  _____       _           
 |  __ \     | |          
 | |__) |   _| |__  _   _ 
 |  _  / | | | '_ \| | | |
 | | \ \ |_| | |_) | |_| |
 |_|  \_\__,_|_.__/ \__, |
                     __/ |
                    |___/     Programming Language

Saleem Ansari (@tuxdna)

http://tuxdna.in/

Presenter Notes

Outline

What is Machine Learning?

Some Math

Demo

Presenter Notes

What is Machine Learning?

  • 1959 - Field of study that gives computers the ability to learn without being explicitly programmed.

  • 1998 - A computer program is said to learn from experience E with respect to some task T and some performance P, if its performance on T, as measured by P, improves with experience E.

Presenter Notes

What can we achive?

  • Product Recommendation: understanding / inferring what your customers are looking for
  • Clustering: grouping similar items or grouping very similar documents, which are perhaps talking about the same subject
  • Regression and Classification: predicting house prices, or identifying a class of an item viz. product, document, person etc.
  • Topic Modeling: identifying topics from documents
  • Frequent Patterns Mining: knowing which entities occur together very often
  • And many more

Presenter Notes

Classification

Classification

Presenter Notes

Clustering

Clustering

Presenter Notes

Recommendation Algorithms

User Based

Item Based

Presenter Notes

Basic Ideas

  • Similarity and Distance metrics
  • Vector and Matrices
  • Statistics
  • Probability

Presenter Notes

Similarity / Distance metrics

Different Similarity metrics

  • Pearson correlation
  • Euclidean distance
  • Cosine measure
  • Spearman correlation
  • Tanimoto coefficient
  • Log likelihood test

Distance to Similarity conversion ( not the only way )

s = 1 / ( 1 + d )

Presenter Notes

Similarity / Distance metrics contd...

Similarity Metric Selection

Presenter Notes

Matrix

Matrix

Presenter Notes

Vector

Vector

Presenter Notes

Statistics

What are the stats almost everyone knows?

  • mean / average / expectation
  • median
  • mode

What about these?

  • variance
  • stardard deviation

Presenter Notes

Probability

  • Conditional Probability: P(A|B) = num(A intersection B) / num(B)
  • Bayes Rule: P(A|B) = P(B|A) / P(B)
  • Probability Distribution: PMF for discreet, PDF for continuous variables

Presenter Notes

Matrix and Vector in Ruby

  • Vector
  • Matrix

( see the ruby docs )

  • What about a SparseMatrix ?

Presenter Notes

Demo

  • Generating Recommendations on MovieLens data
  • Statistical News Classifier
  • Clustering the Reuters news data
  • Naive Bayes Classifier

Presenter Notes

What's next?

Challenges to be solved within Ruby ecosystem:

  • Fast Math
  • Easy Plotting
  • Integrated Environment for ML

Is there any hope ?

  • AI4R
  • SciRuby
  • NMatrix
  • JRuby and Apache Mahout

Presenter Notes

Questions

Presenter Notes

Thanks and happy coding :-)

Presenter Notes