Open Hybrid Architecture: Real World Use-Case
Building on the vision and concepts outlined previously in Arun and Saumitra’s blogs, we wanted to show the Open Hybrid Architecture Initiative (OHAI) concepts in action, and see how they could be used...
View ArticleA Step-by-Step Replication Guide between On-Prem HDFS and Amazon Web Services
This blog was co-authored by Ryan Peterson, Head of Global Data Segment at AWS . Central to empowering businesses to deliver the right data in the right environment to power the right use case is the...
View ArticleAn S3 Gateway to Apache Hadoop Ozone
The AWS S3 protocol is the defacto interface for modern object stores. Ozone-0.3.0-Alpha release adds S3 protocol as a first-class notion to Ozone. For all practical purposes, a user of S3 can start...
View ArticleOpen Hybrid Architecture: O3, the New Rocket Ship
Introducing our Storage Environment O3 Building on the last three blogs (vision, key tenets/concepts, real-world use case) in the Open Hybrid Architecture series, we now want to take a deeper dive into...
View ArticleGetting the Most Out of Your Data in the Cloud with Cloudbreak
There are three common abilities across the cloud providers that I want to focus on and to see how they work together and build on each other to help you maximize agility and data insights in the...
View ArticleData Science & Engineering Platform: Data Lineage and Provenance for Apache...
This is the third in a series of data engineering blogs that we plan to publish. The first blog outlined the data science and data engineering capabilities of Hortonworks Data Platform. Motivation...
View Article2x Faster BI Interactive queries with HDP 3.0
Hortonworks announced the general availability of HDP 3.0 this year. You may read more about it here. Bundled with HDP 3.0, Apache Hive 3 with LLAP took a significant leap as a Enterprise Ready Real...
View ArticleOpen Hybrid Architecture: Running Stateful Containers on YARN
The Why In the previous blog, we talked about the Open Hybrid Architecture. This architecture decouples storage and computation, thus computation tasks need to access various types of storage systems....
View ArticleBig Data Processing Engines – Which one do I use?: Part 1
Special thanks to Bill Preachuk and Brandon Wilson for reviewing and providing their expertise Introduction Columnar storage is an often-discussed topic in the big data processing and storage world...
View ArticleMonitoring Kafka Streams Microservices with Hortonworks Streams Messaging...
In last week’s blog Secure and Governed Microservices with HDF/HDP Kafka Streams Support, we walked through how to build microservices with the new Kafka Streams support in HDF 3.3 and HDP 3.1 that is...
View ArticleIntroducing Hive-Kafka integration for real-time Kafka SQL queries
Our last few blogs as part of the Kafka Analytics blog series focused on the addition of Kafka Streams to HDP and HDF and how to build, secure, monitor Kafka Streams apps / microservices. In this blog,...
View Article{Submarine} : Running deep learning workloads on Apache Hadoop
(This Blogpost is coauthored by Xun Liu and Quan Zhou from Netease). Introduction Hadoop is the most popular open source framework for the distributed processing of large, enterprise data sets. It is...
View ArticleQuery Federation with Apache Hive
Organizations commonly use a plethora of data storage and processing systems today. These different systems offer cost-effective performance for their respective use cases. Besides traditional RDBMSs...
View ArticleApache Hive Warehouse Connector Use-Cases
1. Motivation The HiveWarehouseConnector (HWC) is an open-source library which provides new interoperability capabilities between Hive and Spark. In practice, Hive and Spark are often leveraged...
View ArticleOpen Hybrid Architecture Initiative: Game Changing User Experience Powering...
This is part seven of an on going series about the Open Hybrid Architecture Initiative. You can learn more about the vision, key tenets, real-world use case, new storage environment of O3,...
View Article