Blog, Courses

Data Science specialisations available on Coursera.

Data Science specializations available on Coursera.   Here is a selection of specializations on data science available on Coursera:   Data Mining Data Science Data Structures and Algorithms Cloud Computing Genomic Data Science Data Analysis and Interpretation Data Science at Scale Mastering Software Development in R Executive Data Science Methods and Statistics in Social Sciences… Read More Data Science specialisations available on Coursera.

Blog, Competitions

Gentle Intro to machine learning competitions with ‘Numerai’ – step by step.

In this post, I will share, how simple it is to start competing in machine learning tournaments – Numerai. I will go step by step, line by line explaining what is doing what and why it is required. Numerai is a global artificial intelligence competition to predict the behavior. Numerai is a little bit similar… Read More Gentle Intro to machine learning competitions with ‘Numerai’ – step by step.

Blog, Questions and Answers

What is TensorFlow?

What is TensorFlow? TensorFlow is an open source software library for machine learning developed by Google –  Google Brain team. Name TensorFlow derives from the operations which neural networks perform on multidimensional data arrays, often referred to as “tensors”. It is using data flow graphs, and is capable of building and training variety of different… Read More What is TensorFlow?

Articles, Blog

Top 10 “MUST KNOW” from Python-Pandas for Data Science.

Top 10 “MUST KNOW” from Python-Pandas for Data Science. Pandas is very popular Python library for data analysis, manipulation, and visualization, I would like to share my personal view on the list of most often used functions/snippets for data analysis. 1.Import Pandas to Python import pandas as pd 2. Import data from CSV/Excel file df=pd.read_csv(‘C:/Folder/mlhype.csv’)… Read More Top 10 “MUST KNOW” from Python-Pandas for Data Science.

Blog, Glossary

What is Hadoop YARN?

Hadoop YARN is the architectural center of Hadoop that allows multiple data processing engines such as interactive SQL, real-time streaming, data science and batch processing to handle data stored on a single platform, unlocking an entirely new approach to analytics. YARN is the foundation of the new generation of Hadoop and is enabling organizations everywhere… Read More What is Hadoop YARN?

Blog, Glossary

What is Hadoop Flume?

Hadoop Flume was created in the course of incubator Apache project to allow you to flow data from a source into your Hadoop environment. In Flume, the entities you work with are called sources, decorators, and sinks. A source can be any data source, and Flume has many predefined source adapters. A sink is the… Read More What is Hadoop Flume?

Blog, Glossary

What is Apache Kafka?

Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a “massively scalable pub/sub message queue architected as a distributed transaction log, making it highly valuable… Read More What is Apache Kafka?

Blog, Glossary

What is Hadoop Zookeeper?

Hadoop Zookeeper is an open source Apache™ project that provides a centralized infrastructure and services that enable synchronization across a cluster. ZooKeeper maintains common objects needed in large cluster environments. Examples of these objects include configuration information, hierarchical naming space, etc. Applications can leverage these services to coordinate distributed processing across large clusters. Name services,… Read More What is Hadoop Zookeeper?