Hadoop GroupMapping – LDAP Integration
LDAP provides a central source for maintaining users and groups within an enterprise. There are two ways to use LDAP groups within Hadoop. The first is to use OS level configuration to read LDAP...
View ArticleApache Storm and Hadoop
In February 2014, the Apache Storm community released Storm version 0.9.1. Storm is a distributed, fault-tolerant, and high-performance real-time computation system that provides strong guarantees on...
View ArticleIntroduction to Apache Falcon: Data Governance for Hadoop
Apache Falcon is a data governance engine that defines, schedules, and monitors data management policies. Falcon allows Hadoop administrators to centrally define their data pipelines, and then Falcon...
View ArticleAdvancing Enterprise Hadoop with Hortonworks Data Platform 2.1
The pace of innovation within the Apache Hadoop community is truly remarkable, enabling us to announce the availability of Hortonworks Data Platform 2.1, incorporating the very latest innovations from...
View ArticleTutorials for Hadoop with HDP 2.1: Hive, Tez, Falcon, Knox, Storm
If you’re excited to get started with the new features in Hortonworks Data Platform 2.1, then we’ve included 4 tutorials for you try out – Sandbox-style. You can download the HDP 2.1 Technical Preview...
View ArticleHDFS ACLs: Fine-Grained Permissions for HDFS Files in Hadoop
Securing any system requires you to implement layers of protection. Access Control Lists (ACLs) are typically applied to data to restrict access to data to approved entities. Application of ACLs at...
View ArticleIntroducing Apache Tez 0.4
We are excited to announce that the Apache™ Tez community voted to release version 0.4 of the software. Apache Tez is an alternative to MapReduce that provides a powerful framework for executing a...
View ArticleApache Hadoop 2.4.0 Released!
It gives me great pleasure to announce that the Apache Hadoop community has voted to release Apache Hadoop 2.4.0! Thank you to every single one of the contributors, reviewers and testers! The community...
View ArticleHow To Get Started with Cascading and Hadoop on Hortonworks Data Platform
The power of a well-crafted speech is indisputable, for words matter—they inspire to act. And so is the power of a well-designed Software Development Kit (SDK), for high-level abstractions and logical...
View ArticleAnnouncing Apache Hive 0.13 and Completion of the Stinger Initiative!
The Apache Hive community has voted on and released version 0.13 today. This is a significant release that represents a major effort from over 70 members who worked diligently to close out over 1080...
View ArticleApache Ambari 1.5.1 is Released!
Yesterday the Apache Ambari community proudly released version 1.5.1. This is the result of constant, concerted collaboration among the Ambari project’s many members. This release represents the work...
View ArticleAnnouncing Apache Knox Gateway 0.4.0 for Hadoop Security
The Apache Knox Gateway team is pleased to announce Knox’s first release as an Apache top-level project: Apache Knox Gateway 0.4.0. The team resolved approximately 100 JIRAs for this release and Knox...
View ArticleAutomated Install of HDP 2.1 for Hadoop on Windows
Hortonworks Data Platform 2.1 for Windows is the 100% open source data management platform based on Apache Hadoop and available for the Microsoft Windows Server platform. I have built a helper tool...
View ArticleSeven BoFs for the Hadoop Summit
The first use of the term BoF session was used at the Digital Equipment Users’ Society (DECUS) conference in the 1960s. Its essence was to bring together like minds and thought leaders—just as birds of...
View ArticleSpark for Data Science: A Case Study
I’m a pretty heavy Unix user and I tend to prefer doing things the Unix Way™, which is to say, composing many small command line oriented utilities. With composability comes power and with...
View ArticleApache Hadoop YARN: Resilience of YARN Applications across Resource Manager...
This is the first post in our series on the motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager Restart resiliency. Other in the series are: Resilience of Apache...
View ArticleApache Hadoop YARN: Resilience of YARN applications across ResourceManager...
This is the second in our series on the motivations and architecture for improvements to the Apache Hadoop YARN’s Resource Manager Restart resiliency. Other in the series are: Introduction: Apache YARN...
View ArticleDiscover HDP 2.1: New Features for Security and Apache Knox
Last week Vinay Shukla and Kevin Minder hosted the first of our seven Discover HDP 2.1 webinars. Vinay and Kevin covered three important topics related to new Apache Hadoop security features in HDP...
View ArticleDiscover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
On May 15, Owen O’Malley and Carter Shanklin hosted the second of our seven Discover HDP 2.1 webinars. Owen and Carter discussed the Stinger Initiative and the improvements to Apache Hive that are...
View ArticleEight Big Data and Hadoop Meetups for the Hadoop Summit San Jose
According to New York Observer, there were couple of major social reasons that spurred the genesis and growth of Meetup.com. First, it was Robert Putman’s book Bowling Alone, in which he talks about...
View Article