```
_ __ __ _ _
/\ | | | \/ | | | | |
/ \ _ __ __ _ ___| |__ ___ | \ / | __ _| |__ ___ _ _| |_
/ /\ \ | '_ \ / _` |/ __| '_ \ / _ \ | |\/| |/ _` | '_ \ / _ \| | | | __|
/ ____ \| |_) | (_| | (__| | | | __/ | | | | (_| | | | | (_) | |_| | |_
/_/ \_\ .__/ \__,_|\___|_| |_|\___| |_| |_|\__,_|_| |_|\___/ \__,_|\__|
| |
|_|
and
__
________ ___ / / ___
/ __/ __// _ | / / / _ |
__\ \/ /__/ __ |/ /__/ __ |
/____/\___/_/ |_/____/_/ | |
|/ Programming Language
```

Saleem Ansari (`@tuxdna`

)

Some use-cases:

**Product Recommendation**: understanding / inferring what your customers are looking for**Topic Modeling**: identifying topics from documents**Frequent Patterns Mining**: knowing which entities occur together very often**Clustering**: grouping similar items or grouping very similar documents, which are perhaps talking about the same subject**Regression and Classification**: predicting house prices, or identifying a class of an item viz. product, document, person etc.- And many more

- Similarity and Distance metrics
- Vector and Matrices
- Statistics
- Probability

Different Similarity metrics

- Pearson correlation
- Euclidean distance
- Cosine measure
- Spearman correlation
- Tanimoto coefficient
- Log likelihood test

Distance to Similarity conversion ( not the only way )

```
s = 1 / ( 1 + d )
```

What are the stats almost everyone knows?

- mean / average / expectation
- median
- mode

What about these?

- variance
- stardard deviation

- Conditional Probability:
`P(A|B) = num(A intersection B) / num(B)`

- Bayes Rule:
`P(A|B) = P(B|A) / P(B)`

- Probability Distribution: PMF for discreet, PDF for continuous variables

- Vector
- Matrix

( see the bindings )

- Naive Bayes Classifier
- Clustering the Synthetic Control Data
- Recommendation Algorithms

- No further development in Map-Reduce ( Hadoop ) style, although existing algorithms will remain.
- Existing MR algorithms to be ported from MR1 to MR2.
- All the new algorithms will use Scala Math DSL which can be run seamlessly over Hadoop, Spark or anything else.

