Open Source Communities: The Developer Kingdom
Since our founding in 2011, Hortonworks has had a fundamental belief: the only way to deliver infrastructure platform technology is completely in open source. Moreover, we believe that collaborative...
View ArticleStorm and Kafka Together: A Real-time Data Refinery
Hortonworks Data Platform’s YARN-based architecture enables multiple applications to share a common cluster and data set while ensuring consistent levels of response made possible by a centralized...
View ArticleStart of a New Era: Apache HBase 1.0
The Apache HBase community has released Apache HBase 1.0.0. Seven years in the making, it marks a major milestone in the Apache HBase project’s development, offers some exciting features and new API’s...
View Article5 Ways to Make Your Hive Queries Run Faster
As a data scientist working with Hadoop, I often use Apache Hive to explore data, make ad-hoc queries or build data pipelines. Until recently, optimizing Hive queries focused mostly on data layout...
View ArticleIntroducing Rolling Upgrades and Downgrades of a Apache Hadoop® YARN Cluster
This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting...
View ArticleHIVE 0.14 Cost Based Optimizer (CBO) Technical Overview
Analysts and data scientists⎯not to mention business executives⎯want Big Data not for the sake of the data itself, but for the ability to work with and learn from that data. As other users become more...
View ArticleApache Hadoop YARN in HDP 2.2: Fault-Tolerance Features for Long-running...
This is the second post in a series exploring the theme of long-running service workloads in YARN. See for the introductory post. Long-running services deployed on YARN are by definition expected to...
View ArticleUsing PageRank to Detect Anomalies and Fraud in Healthcare
This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University. Introduction PageRank[1]is the...
View ArticleDeploying Long Running Services on Apache Hadoop YARN Cluster
HDP 2.2 brings substantial innovations in Apache Hadoop YARN, enabling users of Apache Hadoop to efficiently store their data in a single repository and interact with it simultaneously using a wide...
View ArticleBest Practices for Hive Authorization Using Apache Ranger in HDP 2.2
Apache Hive is the de facto standard for SQL in Hadoop with more enterprises relying on this open source project than any other alternative. Stinger.next, a community based effort, is delivering true...
View ArticleUsing PageRank to Detect Anomalies and Fraud in Healthcare
This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University. Introduction This is the second...
View ArticleEnterprise-grade Rolling Upgrades in HDP 2.2
Introduction Today, organizations use the Apache Hadoop™ stack in the form of a central data lake to store their critical datasets and power their analytical processing workloads. A key requirement for...
View ArticleDeploying Long-running services on YARN using Apache Slider
We hosted an Apache Slider Meetup at our Hortonworks Santa Clara office on March 4th, where committers, contributors, and community members interested in the Apache Slider congregated to hear what’s...
View ArticleApache Ambari Technical Workshop
Apache Ambari is the only 100% open source management and provisioning tool for Apache Hadoop. Recent innovations of Apache Ambari have focused on opening Apache Ambari into a pluggable management...
View ArticleUsing PageRank to Detect Anomalies and Fraud in Healthcare
This three part series is co-authored by Ofer Mendelevitch, director of data science at Hortonworks, and Jiwon Seo, Ph.D. and research assistant at Stanford University. Introduction This is the third...
View ArticleNew HDP Certified Developer (HDPCD) Performance-Based Exam
Hortonworks is excited to announce that our first hands-on, performance based certification exam is now available! The HDP Certified Developer (HDPCD) exam is designed for Hadoop developers working...
View ArticleAnnouncing Apache Ambari 2.0
Advances in Hadoop security, governance and operations have accelerated adoption of the platform by enterprises everywhere. Apache Ambari is the open source operational platform for provisioning,...
View ArticleHDFS Rolling Upgrades
This is the third post in a series exploring recent innovations in the Hadoop ecosystem that are included in Hortonworks Data Platform (HDP) 2.2. In this post, we introduce the theme of supporting...
View ArticleIntroducing Automated Rolling Upgrades with Apache Ambari 2.0
The recent post by Jayush Luniya announced the community release of Apache Ambari 2.0. One of the three key Ambari features that Jayush discussed was Rolling Upgrades, enabling Hadoop operators to...
View ArticleAmbari 2.0 for Deploying Comprehensive Hadoop Security
Hortonworks Data Platform (HDP) provides centralized enterprise services for comprehensive security to enable end-to-end protection, access, compliance and auditing of data in motion and at rest. HDP’s...
View Article