Machine Learning- Additional Concepts

In last post I talked about Machine Learning basics. Here I will take up a few additional topics.

Before getting into Recommendation Engine, lets look at a few concepts
Association Rule mining: At times we try to figure out association or relationship between 2 events. For example, we might say that when someone buys milk he might buy bread as well.

Support: Support is an indication of how frequently the events/ itemset appears together. If Event A is buying of milk and Event B is buying of Bread.
Support = number of times bread and milk are bought together/ total number of transactions

Confidence: Confidence is an indication of how often the rule has been found to be true.
number of times A and B occurs/ A occurrences = Supp(A U B)/Supp(A)

Lift = Supp(A U B)/ Supp(A)*Supp(B)

Apriori Algorithm: This algorithm tries to find out items that can be grouped together. It starts from bottom, ie tries to find subsets which are found together and moves up, based on the concept that it a set has items that share a relationship, all subsets also follow the relationship.

Market Basket Analysis: Market Basket Analysis is a modelling technique based upon the theory that if you buy a certain group of items, you are more (or less) likely to buy another group of items.

Recommendation Engine: When we buy something from an ecommerce store or watch a video on video streaming site, we get various recommendation for our next purchase or view. This is done by recommendation engines running behind the scenes.

There are two common types of recommendations used by these engines

User based/ Collaborative Filtering: If you have watched a video on a site like netflix, the system will look for more users who watched the same video and try to figure out the next most common video most of the users watched and recommend it to you.
Content based Recommendation: Instead of user behavior, the engine will try to segregate the contents based on its properties, for example, movie genre, actors, directors etc. So if you watched an action movie, the engine will try to look for movie with similar content, actors, directors, plot etc.

More:
https://www.analyticsvidhya.com/blog/2015/10/recommendation-engines/

Text Mining: Not always one will get clean and ready to use data which can be fed to algorithms to start analysis with. A common use case for text mining is Review. Customer provide review about the products or movies in plain english. It is a tricky task to make machine analyze these texts and figure out what is the opinion being shared.

Here are a few common techniques used for text mining.

Bag of Words: This is a simple modeling technique which try to figure out word counts. Data cleanup is done by removing punctuation, stop words (to, a, the etc), white spaces etc. After that algorithm runs and find out count of each word used. This technique is good for tagging documents by understanding common word patterns being used.

TD-IDF: Term frequency- Inverse Document Frequency
Term Frequency = number of times a term occurred in a document/ total terms in document
Inverse document frequency = total number of documents/ number of documents with term ‘T’

Sentiment Analysis: Here we try to figure out the sentiments, for example if a review provided is positive or negative. If there are terms such as- like, loved, enjoyed etc, we might want to consider it as positive review.

Time Series:
There are problems when we need to handled time based data for example stock prices. We will need to store data in time series where each value is associated to a point in time.
When analyzing time series data, there are a few interesting insights one looks for

Trend: one looks for data movement with respect to time in upward trends, downward trends, horizontal trends etc.
Seasonality: Looks for same pattern during the year. For example fruit prices go up during winter every year.
Cyclic patterns: At times the patterns go beyond the year, eg. a particular trend is seen in 3 years or so. The pattern might not have a fixed time period.
Stationarity: Refers to stability of mean value though there is no specific pattern.

Reinforcement Learning: This is based on reward and penalty approach. The agent / engine is intelligent, which is provided constant feedback if the decision taken was correct or not. Agent can improve future decisions based on feedback provided.
Deep Learning: Sometimes referred to as Artificial Neural network(ANN), one can think of it as artificially replicating a Biological Neural Network (BNN).
There are three core areas of a BNN, a Dendrite (receive signals from neurons), Soma(sums up all the signals) and Axon(signals are transmitted through Axon). ANN tries to replicate the BNN by implementing artificial neurons. Read More- https://en.wikipedia.org/wiki/Artificial_neural_network