Tuesday 17 July 2018

FIND S Algorithm : In Python


Problem : Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based on a given set of training data samples. Read the training data from a .CSV file.


Algorithm :


Source Code : LinkFindS

The Best Open Source Machine Learning Frameworks By Dr Anand Nayyar - January 17, 2017


Learning may be defined as the process of improving one’s ability to perform a task efficiently. Machine learning is another sub-field of computer science, which enables modern computers to learn without being explicitly programmed. Machine learning has basically evolved from artificial intelligence via pattern recognition and computational learning theory. Machine learning explores the area of algorithms, which can make high end predictions on data. In recent times, machine learning has been deployed in a wide range of computing tasks, where designing efficient algorithms and programs becomes rather difficult, such as email spam filtering, optical character recognition, search engine improvement, digital image processing, data mining, etc.
Tom M. Mitchell, renowned computer scientist and professor at Carnegie Mellon University, USA, defined machine learning as: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
Machine learning tasks are broadly classified into three categories, depending on the nature of the learning ‘signal’ or ‘feedback’ available to a learning system.
  • Supervised learning is regarded as a machine learning task of inferring a function from labelled training data. In supervised learning, each example is a pair consisting of an input object (vector) and a desired output value (supervisory signal).
  • Unsupervised learning:  This is regarded as the machine learning task of inferring a function to describe hidden structures from unlabelled data. It is closely related to the problem of density estimation in statistics.
  • Reinforcement learning is an area of machine learning that is linked to how software agents take actions in the environment so as to maximise some notion of cumulative reward. It is applied to diverse areas like game theory, information theory, swarm intelligence, statistics and genetic algorithms. In machine learning, the environment is formulated as a Markov decision process (MDP) due to dynamic programming techniques.
The application of machine learning to diverse areas of computing is gaining popularity rapidly, not only because of cheap and powerful hardware, but also because of the increasing availability of free and open source software, which enable machine learning to be implemented easily. Machine learning practitioners and researchers, being a part of the software engineering team, continuously build sophisticated products, integrating intelligent algorithms with the final product to make software work more reliably, quickly and without hassles.
There is a wide range of open source machine learning frameworks available in the market, which enable machine learning engineers to build, implement and maintain machine learning systems, generate new projects and create new impactful machine learning systems.
Let’s take a look at some of the top open source machine learning frameworks available.
Apache Singa
The Singa Project was initiated by the DB System Group at the National University of Singapore in 2014, with a primary focus on distributed deep learning by partitioning the model and data onto nodes in a cluster and parallelising the training. Apache Singa provides a simple programming model and works across a cluster of machines. It is primarily used in natural language processing (NLP) and image recognition. A Singa prototype accepted by Apache Incubator in March 2015 provides a flexible architecture of scalable distributed training and is extendable to run over a wide range of hardware.
Apache Singa was designed with an intuitive programming model based on layer abstraction. A wide variety of popular deep learning models are supported, such as feed-forward models like convolutional neural networks (CNN), energy models like Restricted Boltzmann Machine (RBM), and recurrent neural networks (RNN).  Based on a flexible architecture, Singa runs various synchronous, asynchronous and hybrid training frameworks.
Singa’s software stack has three main components: Core, IO and Model. The Core component is concerned with memory management and tensor operations. IO contains classes for reading and writing data to the disk and the network. Model includes data structures and algorithms for machine learning models.
Its main features are:
  • Includes tensor abstraction for strong support for more advanced machine learning models
  • Supports device abstraction for running on varied hardware devices
  • Makes use of cmake for compilation rather than GNU autotool
  • Improvised Python binding and contains more deep learning models like VGG and ResNet
  • Includes enhanced IO classes for reading, writing, encoding and decoding files and data
The latest version is 1.0.
Website: http://singa.apache.org/en/index.html
Shogun
Shogun was initiated by Soeren Sonnenburg and Gunnar Raetsch in 1999 and is currently under rapid development by a large team of programmers. This free and open source toolbox written in C++ provides algorithms and data structures for machine learning problems. Shogun Toolbox provides the use of a toolbox via a unified interface from C++, Python, Octave, R, Java, Lua and C++; and can run on Windows, Linux and even MacOS. Shogun is designed for unified large-scale learning for a broad range of feature types and learning settings, like classification, regression, dimensionality reduction, clustering, etc. It contains a number of exclusive state-of-art algorithms, such as a wealth of efficient SVM implementations, multiple kernel learning, kernel hypothesis testing, Krylov methods, etc.
Shogun supports bindings to other machine learning libraries like LibSVM, LibLinear, SVMLight, LibOCAS, libqp, VowpalWabbit, Tapkee, SLEP, GPML and many more.
Its features include one-time classification, multi-class classification, regression, structured output learning, pre-processing, built-in model selection strategies, visualisation and test frameworks; and semi-supervised, multi-task and large scale learning.
The latest version is 4.1.0.
Website: http://www.shogun-toolbox.org/
Apache Mahout
Apache Mahout, being a free and open source project of the Apache Software Foundation, has a goal to develop free distributed or scalable machine learning algorithms for diverse areas like collaborative filtering, clustering and classification. Mahout provides Java libraries and Java collections for various kinds of mathematical operations.
Apache Mahout is implemented on top of Apache Hadoop using the MapReduce paradigm. Once Big Data is stored on the Hadoop Distributed File System (HDFS), Mahout provides the data science tools to automatically find meaningful patterns in these Big Data sets, turning this into ‘big information’ quickly and easily.
  • Building a recommendation engine: Mahout provides tools for building a recommendation engine via the Taste library– a fast and flexible engine for CF.
  • Clustering with Mahout: Several clustering algorithms are supported by Mahout, like Canopy, k-Means, Mean-Shift, Dirichlet, etc.
  • Categorising content with Mahout: Mahout uses the simple Map-Reduce-enabled naïve Bayes classifier.
    The latest version is 0.12.2.
    Website: https://mahout.apache.org/
Apache Spark MLlib
Apache Spark MLlib is a machine learning library, the primary objective of which is to make practical machine learning scalable and easy. It comprises common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction as well as lower-level optimisation primitives and higher-level pipeline APIs.
Spark MLlib is regarded as a distributed machine learning framework on top of the Spark Core which, mainly due to the distributed memory-based Spark architecture, is almost nine times as fast as the disk-based implementation used by Apache Mahout.
The various common machine learning and statistical algorithms that have been implemented and included with MLlib are:
  • Summary statistics, correlations, hypothesis testing, random data generation
  • Classification and regression: Supports vector machines, logistic regression, linear regression, naïve Bayes classification
  • Collaborative filtering techniques including Alternating Least Squares (ALS)
  • Cluster analysis methods including k-means and Latent Dirichlet Allocation (LDA)
  • Optimisation algorithms such as stochastic gradient descent and limited-memory BGGS
    The latest version is 2.0.1.
    Website: http://spark.apache.org/mllib/ 
TensorFlow
TensorFlow is an open source software library for machine learning developed by the Google Brain Team for various sorts of perceptual and language understanding tasks, and to conduct sophisticated research on machine learning and deep neural networks. It is Google Brain’s second generation machine learning system and can run on multiple CPUs and GPUs. TensorFlow is deployed in various products of Google like speech recognition, Gmail, Google Photos and even Search.
TensorFlow performs numerical computations using data flow graphs. These elaborate the mathematical computations with a directed graph of nodes and edges. Nodes implement mathematical operations and can also represent endpoints to feed in data, push out results or read/write persistent variables. Edges describe the input/output relationships between nodes. Data edges carry dynamically-sized multi-dimensional data arrays or tensors.
Its features are listed below.
  • Highly flexible: TensorFlow enables users to write their own higher-level libraries on top of it by using C++ and Python, and express the neural network computation as a data flow graph.
  • Portable: It can run on varied CPUs or GPUs, and even on mobile computing platforms. It also supports Docker and running via the cloud.
  • Auto-differentiation: TensorFlow enables the user to define the computational architecture of predictive models combined with objective functions, and can handle complex computations.
  • Diverse language options: It has an easy Python based interface and enables users to write code, and see visualisations and data flow graphs.
    The latest version is 0.10.0.
    Website: www.tensorflow.org 
Oryx 2
Oryx 2 is a realisation of Lambda architecture built on Apache Spark and Apache Kafka for real-time large scale machine learning. It is designed for building applications and includes packaged, end-to-end applications for collaborative filtering, classification, regression and clustering.
Oryx 2 comprises the following three tiers.
  • General Lambda architecture tier: Provides batch, speed and serving layers, which are not specific to machine learning.
  • Specialisation on top which, in turn, provides machine learning abstraction to hyperparameter selection, etc.
  • End-to-end implementation of the same standard machine learning algorithms as an application (ALS, random decision forests, k-means) on top.
Oryx 2 consists of the following layers of Lambda architecture as well as connecting elements.
  • Batch layer: Used for computing new results from historical data and previous results.
  • Speed layer: Produces and publishes incremental model updates from a stream of new data.
  • Serving layer: Receives models and updates, and implements a synchronous API, exposing query operations on results.
  • Data transport layer: Moves data between layers and takes input from external sources.
    The latest version is 2.2.1.
    Website: http://oryx.io/
Accord.NET
Accord.NET is a .NET open source machine learning framework for scientific computing, and consists of multiple libraries for diverse applications like statistical data processing, pattern recognition, linear algebra, artificial neural networks, image and signal processing, etc.
The framework is divided into libraries via the installer, compressed archives and NuGet packages, which include Accord.Math, Accord.Statistics, Accord.MachineLearning, Accord.Neuro, Accord.Imaging, Accord.Audio, Accord.Vision, Accord.Controls, Accord.Controls.Imaging, Accord.Controls.Audio, Accord.Controls.Vision, etc.
Its features are:
  • Matrix library for an increase in code reusability, and gradual change of existing algorithms over standard .NET structures.
  • Consists of more than 40 different statistical distributions like hidden Markov models and mixture models.
  • Consists of more than 30 hypothesis tests like ANOVA, two-sample, multiple-sample, etc.
  • Consists of more than 38 kernel functions like KVM, KPC and KDA.
The latest version is 3.1.0.
Website: www.accord-framework.net
Amazon Machine Learning (AML)
Amazon Machine Learning (AML) is a machine learning service for developers. It has many visualisation tools and wizards for creating high-end sophisticated and intelligent machine learning models without any need to learn complex ML algorithms and technologies. Via AML, predictions for applications can be obtained using simple APIs without using custom prediction generation code or complex infrastructure.
AML is based on simple, scalable, dynamic and flexible ML technology used by Amazon’s ‘Internal Scientists’ community professionals to create Amazon Cloud Services. AML connects to data stored in Amazon S3, Redshift or RDS, and can run binary classification, multi-class categorisation or regression on this data to create models.
The key contents used in Amazon ML are listed below.
  • Datasources: Contain metadata associated with data inputs to Amazon ML.
  • ML models: Generate predictions using the patterns extracted from the input data.
  • Evaluations: Measure the quality of ML models.
  • Batch predictions asynchronously generate predictions for multiple input data observations.
  • Real-time predictions synchronously generate predictions for individual data observations.
Its key features are:
  • Supports multiple data sources within its system.
  • Allows users to create a data source object from data residing in Amazon Redshift – the data warehouse Platform as a Service.
  • Allows users to create a data source object from data stored in the MySQL database.
  • Supports three types of models: binary classification, multi-class classification and regression.

Discover Ethereum, a Blockchain Based Computing Platform : By Prabal Banerjee - July 16, 2018



Dive into the world of blockchains, and use Ethereum to host a private network and deploy a smart contract.


Blockchain, as you might already know, is a tamper-resistant, append-only, distributed ledger. It is expected to be the next disruptive technology, although some argue that it is more of a foundational technology than something disruptive. Bitcoin was the first large scale adoption of a blockchain based platform, but the blockchain technology is capable of supporting much more than just cryptocurrencies. Ethereum provides an open source blockchain based distributed computing platform on which we can deploy apps to do much more than just financial transactions or money exchange. I’ve chosen Ethereum because it is very popular, and currently ranks second behind bitcoin, in terms of market capitalisation. Besides, it is open source, provides a Turing complete virtual machine – the Ethereum Virtual Machine (EVM) — to deploy the applications, and has an active community to support it.
In this article, we will discuss how to set up a private network that will host the Ethereum platform, including the blockchain. We will also learn to deploy a very basic application and interact with it. First, let us discuss some key terms that will help us understand the later sections.
Smart contract: This is a piece of code that is deployed in a distributed manner across the nodes hosting the network. It administers laws and rules, determines penalties and enforces them across the network. It contains states that are stored on the blockchain.
Consensus: This is the process by which all the different nodes agree about a state change. It ensures that even if malicious parties behave arbitrarily, the honest parties would still be able to reach an agreement.
Transactions: Any change of state is proposed as a transaction. The transactions (individually or collectively) are consented upon and then written into the blockchain.
Queries: Reading states from a current blockchain is known as a query. It does not require consensus, and any node can perform a query by just reading the latest states on the blockchain.
Mining (in Proof of Work systems): A hard cryptographic puzzle needs to be solved before a node can take multiple transactions and form a block. This process is called mining and the nodes which perform this are called miners. Because this is a tough job, to incentivise miners, a mining reward is given to the winner who successfully writes on the blockchain.
Genesis: Every blockchain starts with a genesis block containing the basic rules of that chain. It mainly contains the system parameters.
Prerequisites for installation
We first need a few tools to help us through the installation and deployment. Let’s assume you have Ubuntu 16.04 running on your system, although the tools and methods are quite generic and can be easily ported to other distributions or operating systems too.
Installing Git and Go: There are a few official implementations of the Ethereum protocol. We will build the Golang implementation from source and to check out the latest build, we will need Git. Also, the Go version present in the Ubuntu repositories is old and hence we will use the gophers PPA for the latest version.
$ sudo add-apt-repository ppa:gophers/archive
$ sudo apt-get update
$ sudo apt-get install golang-1.10-go git
Setting up the environment variables: We need to set up some environment variables for Go to function correctly. So let us first create a folder which we will use as our workspace, and set up a GOPATH to that folder. Next, we will update our PATH variable so that the system recognises our compiled binaries and also the previously installed Golang.
$ cd ~
$ mkdir workspace
$ echo export GOPATH=$HOME/workspace” >> ~/.bashrc
$ echo export PATH=$PATH:$HOME/workspace/bin:/usr/local/go/bin:/usr/lib/go-1.10/bin” >> ~/.bashrc
$ source ~/.bashrc


Figure 1: Using Remix to deploy a contract and interact

Installing and running Ethereum
Let us first check out the latest version of geth (go-ethereum) with Git, and then build it from the source.
$ cd ~/workspace
$ git clone https://github.com/ethereum/go-ethereum.git
$ cd go-ethereum/
$ make geth
Upon successful completion of the last command, it will display a path to the geth binary. We don’t want to write the entire path again and again, so let’s add that to the PATH variable:
$ echo export PATH=$PATH:$HOME/workspace/go-ethereum/build/bin” >> ~/.bashrc
$ source ~/.bashrc
Next, we need to create a genesis.json file that will contain the blockchain parameters. It will be used to initialise the nodes. All the nodes need to have the same genesis.json file. Here is a sample of a genesis.json file that we will use. It is saved in our workspace directory.
$ cd ~/workspace/
$ cat genesis.json
{
“config”: {
“chainId”: 1907,
“homesteadBlock”: 0,
“eip155Block”: 0,
“eip158Block”: 0,
“ByzantiumBlock”: 0
},
“difficulty”: “10”,
“gasLimit”: “900000000000”,
“alloc”: {}
}
Now, open a new terminal (say Terminal 1) and run our first Ethereum node on it. To enable that, first create a work folder for the node and a new account. Make sure you don’t forget the passphrase because that can be fatal on the main Ethereum network. In our case, because we are just creating our own private network, this is not such a serious concern. Then we supply a network ID:
cd ~/workspace/
$ mkdir node1
$ geth account new --datadir node1
$ geth init genesis.json --datadir node1
$ geth console --datadir node1 --networkid 1729
We should now be inside the Geth console. You can play around before proceeding but for our purpose, let us head on to creating a second node. To do that, look at the enode details of our current node. On the Geth console, execute the following commands:
> admin.nodeInfo
{
enode: “enode://c2b8714eca73d7a5a4264fa60641a8791ff8d33e47 dbb51f8b590594eb48e2aba9f360f340f358700e41e9d8415d7ca f70c67d12a66096053989c3824f7f64c3@[::]:30303”,
………
We just need the enode details and have to replace the [::] with the localhost IP 127.0.0.1.
Let’s fire up a new terminal, say Terminal 2, to host the second node. We proceed like, before and then supply the previous network ID, bootnode details and a port number for the node to bind to. We will also enable RPC from a particular domain, which we will use later to connect to this node.
$ cd ~/workspace/
$ mkdir node2
$ geth account new --datadir node2
$ geth init genesis.json --datadir node2
$ geth console --datadir node2 --networkid 1729 --port 30304 --bootnodes “enode://c2b8714eca73d7a5a4264fa60641a8791ff8d33e47 dbb51f8b590594eb48e2aba9f360f340f358700e41e9d8415d7c af70c67d12a66096053989c3824f7f64c3@[127.0.0.1]:30303” --rpc --rpccorsdomain “http://remix.ethereum.org”
If everything goes well, you will have two nodes connected to each other, hosting the same blockchain. To confirm, you can use ‘admin.peers’ inside the Geth consoles.
Until now, we just hosted the blockchain. We did not perform any transaction, mine blocks or transfer ether. In fact, all our accounts are empty. Let us now check our account balance and start mining on one of the nodes, say Terminal 1.
> eth.getBalance(eth.accounts[0])
0
> miner.start(1)
We see that our account balance is zero. Starting the mining process may take some time. If you want a break, now is the time, because generating the DAG may take a considerable amount of time. Once the mining starts, you can recheck the account balance to see it increasing. This is because you are getting the mining reward. You can see logs on the console such as ‘mined potential block’ and ‘commit new mining work’ that tell you that a new block has been mined. The other node will also sync the newly created blocks.
To transfer ether from one account to another, we have to unlock the account and transfer funds. For example:
> personal.unlockAccount(eth.accounts[0], "pass", 24*7*360)
true
> eth.sendTransaction({from:eth.accounts[0],
to:"0xba993910008b3940626c83135fa2412b4a91b3b1", value: web3.toWei(6, "ether")})
In the first command, the passphrase needs to be supplied in place of ‘pass’. The last argument is the time duration for unlocking, which is optional. But we don’t want this to bother us again and again.
In the second command, the ‘to’ field should contain the address of the account you want to send to. You can execute ‘eth.accounts[0]’ on Terminal 2 and supply the string here. Once done and once a new block is mined, you can see the change of funds reflected in your account balances. If it says insufficient funds, wait for some new blocks to be mined or decrease the amount. We have sent six ethers in the example shown.
Writing your first solidity contract
Now that we have set up the network to host the blockchain, let us deploy a smart contract and interact with it.
Fire up the browser and head to http://remix.ethereum.org/. We will use the online compiler to compile our code and with it, connect to our localhost and deploy on the private network. Close the default code (Ballot.sol) that is open on Remix, delete the sol file from under the browser dropdown on the left and create a new file named Basic.sol on the browser. Then write the following piece of code inside the contract.
pragma solidity ^0.4.19;
contract Basic {
//declaring an unsigned interger. Let’s make it private
uint private variable;
//this function returns the value of the stored variable
function get() constant public returns (uint retVal) {
return variable;
}
//this functions sets the value of the stored variable to the passed argument value
function set (uint x) public {
variable = x;
}
}
Now, compile to see if it successfully compiles without errors. Next, go to the ‘Run’ tab and select ‘Web3 Provider’ from the ‘Environment’ drop down menu. Choose the default endpoint, if asked.
On clicking ‘Create’ , the contract will be deployed. But on the browser console, you’ll probably see an authentication error. This is because to deploy, your account needs to be unlocked. Head over to the Terminal 2 Geth console and unlock the account as you did while transferring funds. Now, once done, try creating the contract from your browser again. (If you see an error message saying ‘Insufficient funds’, try sending more ethers from your node1 account to your node2 account.) Once deployed and mined, you will see the address and function names on your browser, as seen on the lower right of Figure 1.
Use the ‘get’ button to fetch the current value of the variable. Use ‘set’ to put some value into the variable which you can check by using ‘get’ again. You will notice that ‘get’ gives the value instantaneously, whereas ‘set’ takes time. Also, on using ‘set’, a new transaction is seen on the Geth console on Terminal2. This is because ‘get’ is a query which just returns the current state recorded on the blockchain, whereas ‘set’ attempts to change the state. Any state changes must go through consensus, and hence we need to wait till the transaction is mined before we can see the changes across the nodes.
The final steps
This article was meant to initiate you into the world of Ethereum. There are a lot of things which you can try, and make complex blockchain based projects. I would encourage those interested to try out the following.
  • Exploring the Geth console and expanding the private network to multiple nodes: You can try to make a node malicious or shut down abruptly, and see if and how it impacts the transactions.
  • Writing more elaborate smart contracts: You can send and receive funds to and from smart contracts. Try some of the projects at https://www.ethereum.org/.
  • Using the online compiler was one of the ways to deploy and interact. You can manually compile and deploy using ABI (https://github.com/ethereum/go-ethereum/wiki/Contract-Tutorial) or use a framework like Truffle (http://truffleframework.com/).
  • Geth is one of the flavours implemented in Go. You can use other language implementations like C++ and Python, look into the code and contribute. They are all open source (https://github.com/ethereum). You can even use GUI clients like Mist (https://github.com/ethereum/mist).
  • Use the testnet and main Ethereum network to test and deploy contracts: Warning: The main network involves real money. So be sure about what you are doing. Writing secure contracts is a tough job (https://medium.com/new-alchemy/a-short-history-of-smart-contract-hacks-on-ethereum-1a30020b5fd)!
  • Take a look at other smart contract based projects and learn from them (https://www.stateofthedapps.com/dapps/tab/most-viewed).

Ethereum has a very active community. In case you face issues, a quick Web search should fetch most answers. If not, try their Gitter, Reddit and Stack Exchange platforms. I hope this article helped demystify some of the blockchain buzzwords and gave readers a feel of what blockchain is about. It is still a developing technology, and you can contribute both in terms of theory and implementation.
Source : https://opensourceforu.com/2018/07/discover-ethereum-a-blockchain-based-computing-platform/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+LinuxForYou+%28OpenSourceForYou+%29

Activation function in Neural network

Back Propagation in Neural Network

Explained In A Minute: Neural Networks

Machine Learning for Flappy Bird using Neural Network & Genetic Algorithm

Neural Networks Explained - Machine Learning Tutorial for Beginners