Data Science specializations available on Coursera. Here is a selection of specializations on data science available on Coursera: Data Mining Data Science Data Structures and Algorithms Cloud Computing Genomic Data Science Data Analysis and Interpretation Data Science at Scale Mastering Software Development in R Executive Data Science Methods and Statistics in Social Sciences… Read More Data Science specialisations available on Coursera.
In a previous post, I described very basic code to get into a world of machine learning competitions, Numerai. This one will be a continuation, so if you haven’t read it I recommend to do it- here. In this post, we will add little more complexity to the whole process. We will split out 20%… Read More Numerai – Deep Learning – simple example
In this post, I will share, how simple it is to start competing in machine learning tournaments – Numerai. I will go step by step, line by line explaining what is doing what and why it is required. Numerai is a global artificial intelligence competition to predict the behavior. Numerai is a little bit similar… Read More Gentle Intro to machine learning competitions with ‘Numerai’ – step by step.
TensorFlow Quick Reference Table – Cheat Sheet. TensorFlow is very popular deep learning library, with its complexity can be overwhelming especially for new users. Here is a short summary of often used functions, if you want to download it in pdf it is available here: TensorFlow Cheat Sheet – WWW.MLHYPE.COM If you find it useful… Read More TensorFlow Quick Reference Table – Cheat Sheet.
What is TensorFlow? TensorFlow is an open source software library for machine learning developed by Google – Google Brain team. Name TensorFlow derives from the operations which neural networks perform on multidimensional data arrays, often referred to as “tensors”. It is using data flow graphs, and is capable of building and training variety of different… Read More What is TensorFlow?
Top 10 “MUST KNOW” from Python-Pandas for Data Science. Pandas is very popular Python library for data analysis, manipulation, and visualization, I would like to share my personal view on the list of most often used functions/snippets for data analysis. 1.Import Pandas to Python import pandas as pd 2. Import data from CSV/Excel file df=pd.read_csv(‘C:/Folder/mlhype.csv’)… Read More Top 10 “MUST KNOW” from Python-Pandas for Data Science.
Hadoop YARN is the architectural center of Hadoop that allows multiple data processing engines such as interactive SQL, real-time streaming, data science and batch processing to handle data stored on a single platform, unlocking an entirely new approach to analytics. YARN is the foundation of the new generation of Hadoop and is enabling organizations everywhere… Read More What is Hadoop YARN?
Hadoop Flume was created in the course of incubator Apache project to allow you to flow data from a source into your Hadoop environment. In Flume, the entities you work with are called sources, decorators, and sinks. A source can be any data source, and Flume has many predefined source adapters. A sink is the… Read More What is Hadoop Flume?
Apache Kafka is an open-source stream processing platform developed by the Apache Software Foundation written in Scala and Java. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Its storage layer is essentially a “massively scalable pub/sub message queue architected as a distributed transaction log, making it highly valuable… Read More What is Apache Kafka?
Hadoop Zookeeper is an open source Apache™ project that provides a centralized infrastructure and services that enable synchronization across a cluster. ZooKeeper maintains common objects needed in large cluster environments. Examples of these objects include configuration information, hierarchical naming space, etc. Applications can leverage these services to coordinate distributed processing across large clusters. Name services,… Read More What is Hadoop Zookeeper?