H2O - Introduction



Have you ever been asked to build up a Machine Learning model on a huge database? Ordinarily, the client will provide you the database and ask you to make certain predictions such as who will be the potential buyers; if there can be an early detection of fraudulent cases, etc. To respond these questions, your task would be to develop a Machine Learning algorithm that would provide an answer to the customer’s query. Developing a Machine Learning algorithm from scratch is not an easy task and why should you do this when there are several ready-to-use Machine Learning libraries available in the market.

Nowadays, you would rather use these libraries, apply a well-tested algorithm from these libraries and look at its performance. If the performance were not within acceptable limits, you would attempt to either fine-tune the current algorithm or try an altogether various one.

Likewise, you may try multiple algorithms on the same dataset and then pick up the best one that satisfactorily meets the client’s requirements. This is where H2O comes to your rescue. It is an open source Machine Learning framework with full-tested implementations of several widely-accepted ML algorithms. You just have to pick up the algorithm from its huge repository and apply it to your dataset. It contains the most generally used statistical and ML algorithms.

To mention a few here it includes gradient boosted machines (GBM), generalized linear model (GLM), deep learning and many more. Not only that it also supports AutoML functionality that will rank the performance of various algorithms on your dataset, thus reducing your efforts of finding the best performing model. H2O is utilized worldwide by more than 18000 organizations and interfaces well with R and Python for your ease of development. It is an in-memory platform that gives superb performance.

In this study note, you will first learn to install the H2O on your machine with both Python and R options. We will understand how to use this in the command line so that you understand its working line-wise. If you are a Python lover, you may use Jupyter or any other IDE of your choice for developing H2O applications. If you prefer R, you may use RStudio for development.

In this study note, we will think about an example to understand how to go about working with H2O. We will also learn how to change the algorithm in your program code and compare its performance with the earlier one. The H2O also gives a web-based tool to test the different algorithms on your dataset. This is called Flow.

The study note will introduce you to the use of Flow. Alongside, we will discuss the utilization of AutoML that will identify the best performing algorithm on your dataset. Are you not excited to learn H2O? Keep reading!





Input your Topic Name and press Enter.