Quantcast
Channel: From the Dev Team – Hortonworks
Viewing all 333 articles
Browse latest View live

Best practices in HDFS authorization with Apache Ranger

$
0
0

HDFS is core part of any Hadoop deployment and in order to ensure that data is protected in Hadoop platform, security needs to be baked into the HDFS layer. HDFS is protected using Kerberos authentication, and authorization using POSIX style permissions/HDFS ACLs or using Apache Ranger.

Apache Ranger (http://hortonworks.com/hadoop/ranger/) is a centralized security administration solution  for Hadoop that enables administrators to create and enforce security policies for HDFS and other Hadoop platform components.

How Ranger policies work for HDFS?

In order to ensure security in HDP environments, we recommend all of our customers to implement Kerberos, Apache Knox and Apache Ranger.

Apache Ranger offers a federated authorization model for HDFS. Ranger plugin for HDFS checks for Ranger policies and if a policy exists, access is granted to user. If a policy doesn’t exist in Ranger, then Ranger would default to native permissions model in HDFS (POSIX or HDFS ACL). This federated model is applicable for HDFS and Yarn service in Ranger.

Screen Shot 2016-01-05 at 8.44.38 AM

For other services such as Hive or HBase, Ranger operates  as the sole authorizer which means only Ranger policies are in effect. The option for fallback model is configured using a property in Ambari → Ranger → HDFS config → Advanced ranger-hdfs-security

Screen Shot 2016-01-05 at 8.45.39 AM

The federated authorization model enables customers to safely implement Ranger in an existing cluster without affecting  jobs which rely on POSIX permissions. We recommend to enable  this option as the default model for all deployments.

Ranger’s user interface makes it easy for administrators to find the permission (Ranger policy or native HDFS) that provides access to the user. Users can simply navigate to Ranger→ Audit and look for the values in the enforcer column of the audit data. If the populated value in Access Enforcer column is “Ranger-acl”, it indicates that a Ranger policy provided access to the user. If the Access Enforcer value is “Hadoop-acl”, then the access was provided by native HDFS ACL or POSIX permission.

Screen Shot 2016-01-05 at 8.45.49 AM

Best practices for HDFS authorization

Having a federated authorization model may create a challenge for security administrators looking to plan a security model for HDFS.

After Apache Ranger and Hadoop have been installed, we recommend administrators to implement the following steps:

  • Change HDFS umask to 077
  • Identify directory which can be managed by Ranger policies
  • Identify directories which need to be managed by HDFS native permissions
  • Enable Ranger policy to audit all records

Here are the steps again in detail.

  1. Change HDFS umask to 077 from 022. This will prevent any new files or folders to be accessed by anyone other than the owner

Administrators can change this property via Ambari:

Screen Shot 2016-01-05 at 8.45.58 AM

The umask default value in HDFS is configured to 022, which grants all the users  read permissions to all HDFS folders and files. You can check by running the following command in recently installed Hadoop

$ hdfs dfs -ls /apps

Found 3 items

drwxrwxrwx   – falcon hdfs       0 2015-11-30 08:02 /apps/falcon

drwxr-xr-x   – hdfs   hdfs           0 2015-11-30 07:56 /apps/hbase

drwxr-xr-x   – hdfs   hdfs           0 2015-11-30 08:01 /apps/hive

  1. Identify the directories that can be managed by Ranger policies

We recommend that permission for application data folders (/apps/hive, /apps/Hbase) as well as any custom data folders be managed through Apache Ranger. The HDFS native permissions for these directories need to be restrictive. This can be done through changing permissions in HDFS using chmod.

Example:

$ hdfs dfs -chmod -R 000 /apps/hive

$ hdfs dfs -chown -R hdfs:hdfs /apps/hive

$ hdfs dfs -ls /apps/hive

Found 1 items

d———   – hdfs hdfs          0 2015-11-30 08:01 /apps/hive/warehouse

Then navigate  to Ranger admin and give explicit permission to users as needed. For example:

Screen Shot 2016-01-05 at 8.46.04 AM

Administrators should follow the same process  for other data folders as well. You can validate  whether your changes are in effect by doing the following:

  • Connect to HiveServer2 using beeline
  • Create a table
    • create table employee( id int, name String, ssn String);
  • Go to ranger, and check the HDFS access audit. The enforcer should be ‘ranger-acl’Screen Shot 2016-01-05 at 8.46.16 AM
  1. Identify directories which can be managed by HDFS permissions

It is recommended  to let HDFS manage the permissions for /tmp and /user folders. These are used by applications and jobs which create user level directories.

Here, you should also set the initial permission for /user folder  to “700”, similar to the example below

 

hdfs dfs -ls /user

Found 4 items

drwxrwx—   – ambari-qa hdfs          0 2015-11-30 07:56 /user/ambari-qa

drwxr-xr-x   – hcat      hdfs          0 2015-11-30 08:01 /user/hcat

drwxr-xr-x   – hive      hdfs          0 2015-11-30 08:01 /user/hive

drwxrwxr-x   – oozie     hdfs          0 2015-11-30 08:02 /user/oozie

 

$ hdfs dfs -chmod -R 700 /user/*

$ hdfs dfs -ls /user

Found 4 items

drwx——   – ambari-qa hdfs          0 2015-11-30 07:56 /user/ambari-qa

drwx——   – hcat      hdfs          0 2015-11-30 08:01 /user/hcat

drwx——   – hive      hdfs          0 2015-11-30 08:01 /user/hive

drwx——   – oozie     hdfs          0 2015-11-30 08:02 /user/oozie

  1. Ensure auditing for all HDFS data.

Auditing in Apache Ranger can be controlled as a policy. When Apache Ranger is installed through Ambari, a default policy is created for all files and directories in HDFS and with auditing option enabled.This policy is also used by Ambari smoke test user “ambari-qa” to verify HDFS service through Ambari. If administrators disable this default policy, they would need to create a similar policy for enabling audit across all files and folders.

Screen Shot 2016-01-05 at 8.46.21 AM

Summary

Securing HDFS files through permissions is a starting point for securing Hadoop. Ranger provides a centralized interface for managing security policies for HDFS. Security administrators are recommended to use a combination of HDFS native permissions and Ranger policies to provide comprehensive coverage for all potential use cases. Using the best practices outlined in this blog, administrators can simplify the access control policies for administrative and user directories, files in HDFS.

The post Best practices in HDFS authorization with Apache Ranger appeared first on Hortonworks.


What’s new in SmartSense 1.2?

$
0
0

Built into Ambari 2.2, New SmartSense Gateway for complex corporate network environments and expanded rule-sets.

Hortonworks launched SmartSense in 2015 to help customers quickly collect cluster configuration, metrics, and logs to proactively detect issues, and expedite support cases troubleshooting.  This diagnostic information is packaged into an encrypted and anonymized bundle and sent to Hortonworks for analysis.  The result of that analysis is available as customized recommendations to help prevent issues and improve the performance and operations of the cluster.

With SmartSense 1.2, we focused on the following key areas:

  • Ease-of-Use
  • Automatic Upload
  • Expanded Recommendations

Ease-of-Use: Built Into Ambari 2.2

SmartSense 1.2 is now part of the default services in Ambari 2.2, enabling customers to simply choose it as a service during install, or post-install.  This eliminates the need for manual steps that were previously required to install and activate SmartSense. For many customers this means a change request isn’t required for getting SmartSense deployed to their cluster.

Automatic Upload: Complex Network Environments

Anytime the information needs to leave the datacenter, logistics around data transfer become complicated.  With SmartSense 1.2, a new component, the SmartSense Gateway, allows for a single firewall rule and a central point of egress for bundles leaving the datacenter. The SmartSense Gateway can be deployed anywhere within the organization and shared between multiple SmartSense instances. The illustration below shows how this works:

HortonworksSmartSense_1.2More information on deploying the Gateway, and setting up an automatic Analysis bundle capture schedule is available here.

Expanded Recommendations: More is Better

Bundles sent to Hortonworks for analysis are provided with security, performance, and operations recommendations for our key stack components.  In this release we’ve increased our ruleset by 4x allowing us to cover more HDP component versions and providing more in-depth analysis for HDFS, YARN, MapReduce, Tez, Hive, and HBase.  Here is what our customers are saying about SmartSense:

“Hadoop configuration is something that we’ve struggled with quite a bit, and having SmartSense take the guesswork out of our configuration validation by giving us a single source of truth has been immensely valuable”

“The continuous analysis provided by SmartSense provides tremendous value”

“Our jobs are now complete in 10 minutes as opposed to 45 minutes before SmartSense.”

“I sleep better at night knowing that my cluster configuration is current”

Getting Started

SmartSense works with both Ambari, and non-Ambari managed clusters.  All versions of Linux supported by HDP are certified with SmartSense, including non-root deployments.  Installation is quick, and a complete end-to-end walkthrough is available at https://youtu.be/zx9HtUmkw8k

Need more information?  Take a look at the documentation available here, ask a question on HCC, or reach out to your Hortonworks Account Team.

The post What’s new in SmartSense 1.2? appeared first on Hortonworks.

Data Is Your Most Valuable Asset

$
0
0

I recently discussed the future of data with Arun Murthy, one of the founders of Hortonworks.  

The good news is that, not only is the technology available, but Hortonworks customers are already getting competitive advantage from their data right now.   The dream that Arun and the other founders had when they created Hortonworks of Powering the Future of Data, is a reality.

We’ve also known for a while that the volume and variety of data is growing at an amazing rate. It’s only increasing.  Now every connected thing is emitting a signal.  The entire world is covered with wireless over 3G or 4G, and 5G is only going to be faster and cheaper.   It has also become practical and affordable to add sensors into just about anything.   

But what’s next?  What do you need to do? After speaking with Arun, I came out with two key pieces of advice.

First, realize that data is the most valuable asset you own.    It is only through insights on data that businesses truly get to competitive differentiation nowadays.  Today’s digital world is otherwise just too complex to run intuitively.

Technologies like databases, cloud, virtualization, and the like, come and go.  On the other hand, data is permanent and its impact is tangible. But it’s something that needs to be curated and collected to derive the true value.  Done right, it’s something that can change the very essence of how business is transacted.   

There are retailers, for example, who have captured every single click ever on their web site and have designed their business around the insights they have derived from this data.   That sort of depth and breadth of data is unbelievably valuable, and has propelled their companies forward with foresight and insight.

Replacing that kind of data would just be impossible —  like changing memories of events and experiences.  It just can’t be done. 

Second, companies need to start now by finding and keeping all their data

Today, it is possible to cost effectively collect and store all of the data. Starting with customer behavioral data (clickstreams, locations, time, preferences, product usage, buying patterns).  Then expanding out to all business data.  Then eventually starting to capture the data that matters outside the business across the supply chain and business ecosystem.  

Businesses then also need to look to their apps. Most apps in the past held data hostage in separate silos.  It was really hard to move and aggregate data in a meaningful way from one app to another.   Companies are starting to realize by sharing, collating and aggregating data across all of their apps, data and the corresponding insights become a highly tangible asset. That’s what the notion of the modern data app is all about.

Listening to Arun, one can’t help to come to the conclusion that the future of data is really bright, and everybody — enterprises, organizations, individuals — are already benefitting from amazing advances across every industry and line of business.  

The post Data Is Your Most Valuable Asset appeared first on Hortonworks.

Kicking off the Apache Metron Tech Preview 1 Blog Series

$
0
0

Hello from the Metron PM and Engineering Team

Over the last few months, you may have read a series of blogs written by the Metron Product Management and Engineering teams on CyberSecurity and Analytics and the role that Big Data / Hadoop / HDP plays in this space. In December 2015, James Sirota, Director of Security Solutions at Hortonworks, authored “Leveraging Big Data For Security Analytics,” which describes how the Cisco OpenSOC project evolved into Apache Metron, a Big Data security analytics framework. Apache Metron integrates a variety of open source big data technologies in order to offer a centralized tool for cyber security monitoring and analysis.

Michael Schiebel, CyberSecurity Strategist at Hortonworks, introduced himself via his blog series, “echo ‘Hello, world‘” providing a glimpse into today’s challenges faced by a Security Operations Center (SOC) analyst. He explains why looking at alerts generated by rules engines in point security solutions and security information event management (SIEM) tools are the wrong approach. Rather, the need for next generation security tools to get to the right data quickly is vital for SOC Analysts to monitor, analyze and perform front-line investigations of cyber security risks. Next in the series, “CyberSecurity: The end of rules are Nigh” and “Why Context Matters”, Michael Shiebel describes how to find the few alerts that matter and the importance of a single platform that stores telemetry data (logs, network, packet capture, etc.), as well as provide data analysis tools. These tools help analysts to “look where the bullet holes aren’t” as described by Schiebel.

Today, the Hortonworks product management and engineering teams are kicking off a multi-part blog series on Apache Metron, a next gen security analytics application built by the Apache Metron Community led by Hortonworks. Over the course of the next few weeks, the team will release articles covering key Apache Metron topics:

  • Part 1: Apache Metron Explained (that’s what the rest of this blog is all about).
  • Part 2: Apache Metron User Personas and Core Functional Themes – Who are the different users of Apache Metron? What are the core functional themes? What has been the focus for the first release?
  • Part 3: Apache Metron Tech Preview 1 – A walk through of what the Apache Metron community has been working on for the last 4 months. By the end of the third blog, you will have reached a very good understanding of what is offered in Metron Tech Preview 1, as well as how to install Apache Metron on AWS or single node vagrant VM, deploy and build on top of it.
  • Part 4: Apache Metron UI and Finding a Needle in the Haystack Use Case – We will walk through the Metron UI components and how SOC Analyst would use it for common Metron use cases.
  • Part 5: Apache Metron Tech Preview 1 Under the Covers – We will deep dive into some of the key Metron TP 1 features describing the design and architecture.
  • Part 6: Apache Metron What’s Next – Now, with a solid understanding of TP1, the team provides a glimpse into what’s next for cyber security analytics.

Each of these blogs will provide an intro to their respective topics and the deeper level details will be continued in HCC articles in the Hortonworks Community Connection in the new CyberSecurity Track.

Roots of Apache Metron

To understand Apache Metron, we have to first start with the origins of the project which emerged from the Cisco Project called OpenSOC. The below diagram highlights some of the key events in the history of Apache Metron.

2005 to 2008

The Problem – Cyber crime spiked significantly and a severe shortage of security talent arose. The first set of companies alerted to this issue are high profile banks and large organizations with interesting proprietary information to state sponsored agents. All of the best investigators and analysts were gobbled up by multinational banking and financial services firms, large hospitals, telcos, and defense contractors.

The Rise of a New Industry, the Managed SOC – Those who could not acquire security talent were still in need of a team. Cisco was sitting on a gold mine of security talent that they had accumulated over the years. Utilizing this talent, they p­roduced a managed service offering around managed security operations centers.

Post 2008

The Age of Big Data Changed Everything – The Age of Big Data arrived, bringing more streaming data, virtualized infrastructure, data centers emitting machine exhaust from VMs, and Bring Your Own Device programs. The amount of data exploded and so did the cost of the required tools like traditional SIEMs. These tools became cost prohibitive as they changed to data driven licensing structures. Cisco’s ability to operate the managed SOC with these tools was in jeopardy and security appliance vendors took control of the market.

2013

OpenSOC is Born and Hadoop Matures – Cisco decided to build a toolset of their own. They didn’t just want to replace these tools but they wanted to improve and modernize them, taking advantage of open source. Cisco released its managed SOC service to the community as Hadoop matured and Storm became available. It was a perfect combination of a use case need and technology. OpenSOC was the first project to take advantage of Storm, Hadoop, and Kafka, as well as migrate the legacy ways into a forward thinking future type paradigm.

September 2013 thru April 2015

The Origins of Apache Metron – For about 24 months, a Cisco team, led by their chief data scientist James Sirota, with the help of a Hortonworks team, led by platform architect Sheetal Dolas, worked to create a next generation managed SOC service built on top of open source big data technologies. The Cisco OpenSOC managed SOC offering went into production for a number of customers in April of 2015. A short time after, Cisco made a couple of acquisitions that brought in third party technologies transforming OpenSOC into a closed source, hardware based version.

October 2015

OpenSOC Chief Data Scientist Joins Hortonworks – James Sirota, the chief data scientist and lead of the Cisco OpenSOC initiative, leaves Cisco to join Hortonworks. Over the course of the next 4 months, James starts to build a rock star engineering team at Hortonworks with the focus of building an open-source CyberSecurity application.

December 2015

Metron Accepted into Apache Incubation – Hortonworks, with the help and support of key Apache community partners, including ManTech, B23 and others, submit Metron (renamed from OpenSOC) as an Apache incubator project. In December of 2015, the project is accepted into Apache incubation. Hortonworks and the community innovate at impressive speeds to add new features to Apache Metron and harden the platform. The Metron team builds an extensible, open architecture to account for the variety of tools used in customer environments (thousands of firewalls, thousands of domains and a multitude of Intrusion Detection Systems). Metron’s open approach makes it much easier to tailor to the community’s use cases.

April 2016

First official Release of Apache Metron 0.1 – After 4 months of hard work and rapid innovation by the Metron community, Apache Metron’s first release Metron 0.1 is cut.

Given Hortonworks proven commitment to the Apache Software Foundation process and our track record for creating and leading robust communities, we feel uniquely qualified to bring this important technology and its capabilities to the broader open source community. Without Hortonworks, the Apache Metron project would not exist today!

Understanding Apache Metron Deeper

To get a deeper level understanding of Apache Metron, continue this blog in the following article in the Hortonworks Community Connection: Apache Metron Explained!

About the Authors

Bio: George Vetticaden is a Principal Architect at Hortonworks, Senior Product Owner/Manager for Metron/CyberSecurity, and committer on the Apache Metron project. Over the last 4 years at Hortonworks, George has spent time in the field with enterprise customers helping them build big data solutions on top of Hadoop. In his previous role at Hortonworks, George was the Director of Solutions Engineering where he led a team of 15 Big Data Senior Solution Architects helping large enterprise customers with use case inception, design, architecture, to implementation of use cases monetizing data with Hadoop. George graduated from Trinity University with a BA in Computer Science.

(LinkedIn Profile: https://www.linkedin.com/in/georgevetticaden)

 

jamessirota

Bio: James Sirota is Director of Security Solutions at Hortonworks and committer on the Apache Metron project. Previously James was the Chief Data Scientist at Cisco focused on Big Data security analytics, and spearheaded OpenSOC. His primary expertise is in the design and implementation of Big Data platforms on top of Hadoop, MapReduce, Yarn, Storm, Kafka, Elastic Search and Flume. James holds a Data Science degree, a Master’s in Computer Engineering and is a licensed information security professional.

(LinkedIn Profile: https://www.linkedin.com/in/jsirota )

 

The post Kicking off the Apache Metron Tech Preview 1 Blog Series appeared first on Hortonworks.

Apache Metron User Personas and Core Functional Themes

$
0
0

On Tuesday April 12th, we released the first of our multi-part Big Data Cybersecurity Analytics blog series titled Roots of Apache Metron, authored by the Hortonworks product management and engineering teams, to announce Apache Metron 0.1 release. Built with the Apache Community, Metron is a next generation cyber security application that detects and responds to Advanced Persistent Threats. Security Operation Centers (SOC) can receive alerts to suspicious events as a result of filtering, enriching, storing and analyzing telemetry data or “data in motion” (logs, network, packet capture, etc.). In the HCC article, Metron Explained, George Vetticaden, Hortonworks Principal Architect and Cybersecurity Product Manager, discusses the roots of Apache Metron and traces a telemetry event as it flows across the platform. This leads to the following questions that we will cover here, as well as in the HCC article, Apache Metron User Personas and Why Metron?.

  • Who will be the different users of Apache Metron?
  • What are the functional themes of Apache Metron?
  • Why Metron? A Data Scientist Perspective
  • Why Metron? A SOC Analyst & Investigator Perspective

Bridging the Cybersecurity Talent Gap

In the U.S. alone, companies posted 49,493 jobs requiring Certified Information Systems Security Professional (CISSP) certification last year, however, there are only 65,362 CISSP holders, the majority of whom are already employed. (Burning Glass Technologies. Job Market Intelligence: Cybersecurity Jobs 2015). Metron helps organizations discover the true value they can deliver by building capabilities in their people and processes to bridge the talent gap. There are six different security professionals in the SOC who can all benefit from using Metron. Each professional has well-defined responsibilities and different objectives. Metron is built to help the SOC and people scale with real-time data ingestion, telemetries that are correlated, an automated incident response process, easier data access and search, and vulnerability management and integration with external threat intel sources. In a later blog we will cover Metron’s SOC maturity model and how organizations move through the different levels maturing to analytics and machine learning that big data platforms like Hadoop and Metron’s Security Analytics help enable for them.

Apache Metron’s Consumers

There are six user personas that Metron aims to target.

Metron User Personas

For a more in depth look at Metron’s user personas, please see the HCC article: Apache Metron User Personas and Why Metron?.

Apache Metron Core Functional Themes

We will now describe the four core functional themes that Metron will focus on. As the community around Metron continues to grow, new features and enhancements will be prioritized across these four themes.

The 4 core functional themes are the following:

metron-functional-themes

Apache Metron Release 0.1 and its Target Personas and Themes

Over the last 4 months, the community led by Hortonworks, has been hard at work on Apache Metron’s first release (Metron 0.1).

Now that we have highlighted the user personas and core themes for Metron, the following depicts where the engineering focus has been for Metron 0.1.

Metron TP1 core themes

As the diagram above illustrates, the key focus areas for Metron 0.1 are the following:

  • The platform theme was the primary focus. Before we could focus on the UI and support more telemetry data sources, we needed to ensure that the platform was rock solid. This meant ensuring an easy way to provision this very complex app. In addition, considerable work went into refactoring the code base and addressing technical debt. This included work to ensure that code was simpler and easier to maintain, the ability to add new data sources in a declarative manner, performance and extensible improvements and improving the quality of the code.
  • The persona of focus was the Security Platform Engineer.
  • Metron 0.1 offers dashboard views for the SOC analyst and investigator.

More Details on HCC

More details on Metron user personas and what those users can do with the platform that they couldn’t do with traditional security tools can be found in the following HCC article: Apache Metron User Personas and Why Metron?.

About the Authors

Bio: George Vetticaden is a Principal Architect at Hortonworks, Senior Product Owner/Manager for Metron/CyberSecurity, and committer on the Apache Metron project. Over the last 4 years at Hortonworks, George has spent time in the field with enterprise customers helping them build big data solutions on top of Hadoop. In his previous role at Hortonworks, George was the Director of Solutions Engineering where he led a team of 15 Big Data Senior Solution Architects helping large enterprise customers with use case inception, design, architecture, to implementation of use cases monetizing data with Hadoop. George graduated from Trinity University with a BA in Computer Science.

(LinkedIn Profile: https://www.linkedin.com/in/georgevetticaden)

jamessirota

Bio: James Sirota is Director of Security Solutions at Hortonworks and committer on the Apache Metron project. Previously James was the Chief Data Scientist at Cisco focused on Big Data security analytics, and spearheaded OpenSOC. His primary expertise is in the design and implementation of Big Data platforms on top of Hadoop, MapReduce, Yarn, Storm, Kafka, Elastic Search and Flume. James holds a Data Science degree, a Master’s in Computer Engineering and is a licensed information security professional.

(LinkedIn Profile: https://www.linkedin.com/in/jsirota )

The post Apache Metron User Personas and Core Functional Themes appeared first on Hortonworks.

Apache Metron Tech Preview 1 – Come and Get It!

$
0
0

In Kicking Off the Apache Metron Tech Preview 1 Blog Series, we introduced the origins of the Apache Metron™  project. In Part 2, Apache Metron User Personas and Core Functional Themes, we covered how Metron helps organizations build capabilities in their people and processes to help bridge the cyber security talent gap.

In this third part of our series, we will provide an update on what the community has been developing over the past four months. Finally, we conclude by demonstrating how to start working with Apache Metron right now and join this rockstart community.

We are proud to announce that the culmination of the last four months of hard work is the first release of Apache Metron 0.1, which Hortonworks is releasing as Apache Metron Tech Preview 1 (Metron TP1).

Metron TP1 Features

The following are key capabilities available in Metron TP1 broken up across its four key functional themes.

The details of what’s in Metron Tech Preview 1 can be found in the following HCC Article Apache Metron TP 1.

How do I get Started?

You can spin up Metron TP1 in two ways:

  • Ansible based Vagrant Single Node VM Install – This is a great place to start as an introduction to Apache Metron. Detailed installation instructions can be found in the Hortonworks Community Connection (HCC) Article: Apache Metron TP 1 Install Instructions – Single Node Vagrant Deployment
  • Cloud-based install for a complete 10 Node Metron Cluster using Ambari Blueprints and AWS APIs – If you want a more realistic setup of the Metron app, you can install it on AWS. Keep in mind that this install will spin up 10 m4.xlarge EC2 instances by default. Detailed installation instructions can be found in the HCC: Apache Metron – First Steps in the Cloud

Where do I get Help?

Hortonworks has created a new Community Cybersecurity Track in HCC.  Metron subject matter experts are answering questions and moderating the new Track for anything related to Apache Metron and Cybersecurity. When asking a question about Metron TP1, select “CyberSecurity” Track and add the following tags: “Metron” and “tech-preview”.

See below for more details:

Metron HCS Help

Join the Apache Metron Community

This blog series should be getting you excited about the fantastic work the Apache Metron community is doing. Please consider joining a community of rockstars. If you are interested in joining, follow these eight simple steps:

  1. Subscribe to the Apache Metron user mailing list, user@metron.incubator.apache.org, by sending an email to user-subscribe@metron.incubator.apache.org with “subscribe” in the subject field. This mailing list is if you have questions around installation, run into issues, have general questions, etc.
  2. Subscribe to the Apache Metron dev mailing list, dev@metron.incubator.apache.org, by sending an email to dev-subscribe@metron.incubator.apache.org with “subscribe” in the subject field.  This mailing list is for contributors who have questions about stuff like architecture or want to contribute.
  3. Introduce yourself on the user mailing list.
  4. Become familiar with the Metron code base: https://github.com/apache/incubator-metron
  5. Spin up Metron on single node VM as described above.
  6. Join the Metron IRC Channel: apache-metron
  7. Setup your development environment using the following instructions: Metron Development Environment Setup Instructions
  8. Contribute!

How can you Contribute?

Here are some ideas:

  • Do you work with security telemetry data and logs? The quickest way to contribute is to write parsers for different telemetry data sources. It will be helpful if you have access to sample logs emitted from these sources. Some of the security data source parsers needed for ingestion into Metron are: FireEye, Cisco ISE, Lancope, SourceFire, CarbonBlack, BlueCoat, Active Directory, Palo Alto Network, etc. The following provides more details: Parser Component
  • Are you a SOC personnel, a designer, or UI developer? Over the next couple of releases of Metron, we want to focus on building a next generation UI for Metron used by the SOC analyst and investigator. If this interests you, join the weekly Metron UI community meeting. Details can be found here.
  • Are you familiar with Storm, Kafka, Solr/Elastic, Hadoop, Kibana? Help us continue to harden and enhance the core Metron platform.
  • Are you a data scientist with some Security domain expertise? Join the community to help build out the analytics packs and models using Spark, Python, sci-kit, Jupyter notebooks, etc.
  • Are you just a rockstar developer? Join the Apache Metron community and contribute to the areas that interest you.

More Details on HCC

To get more details about Apache Metron 0.1, including the Apache Jiras, continue to the Apache Metron Tech Preview 1 article in the Hortonworks Community Connection (HCC) .

Enjoy exploring Metron TP1!

Apache Metron and its logo are trademarks of the Apache Software Foundation.  All other trademarks are the property of their respective owners.

About the Authors

Bio: George Vetticaden is a Principal Architect at Hortonworks, Senior Product Owner/Manager for Metron/CyberSecurity, and committer on the Apache Metron project. Over the last 4 years at Hortonworks, George has spent time in the field with enterprise customers helping them build big data solutions on top of Hadoop. In his previous role at Hortonworks, George was the Director of Solutions Engineering where he led a team of 15 Big Data Senior Solution Architects helping large enterprise customers with use case inception, design, architecture, to implementation of use cases monetizing data with Hadoop. George graduated from Trinity University with a BA in Computer Science.

(LinkedIn Profile: https://www.linkedin.com/in/georgevetticaden)

jamessirota

Bio: James Sirota is Director of Security Solutions at Hortonworks and committer on the Apache Metron project. Previously James was the Chief Data Scientist at Cisco focused on Big Data security analytics, and spearheaded OpenSOC. His primary expertise is in the design and implementation of Big Data platforms on top of Hadoop, MapReduce, Yarn, Storm, Kafka, Elastic Search and Flume. James holds a Data Science degree, a Master’s in Computer Engineering and is a licensed information security professional.

(LinkedIn Profile: https://www.linkedin.com/in/jsirota )

 

The post Apache Metron Tech Preview 1 – Come and Get It! appeared first on Hortonworks.

Introducing Availability of HDP 2.3 – Part 2

$
0
0

On July 22nd, we introduced the general availability of HDP 2.3. In part 2 of this blog series, we explore notable improvements and features related to Data Access. SQL on Hadoop Spark 1.3.1 Stream Processing Systems of Engagement that scale HDP Search We are especially excited about what these data access improvements mean for our […]

The post Introducing Availability of HDP 2.3 – Part 2 appeared first on Hortonworks.

Running Operational Applications (OLTP) on Hadoop using Splice Machine and Hortonworks


Introducing Availability of HDP 2.3 – Part 3

$
0
0

Last week, on July 22nd, we announced the general availability of HDP 2.3. Of the three part blog series, the first blog summarized the key innovations in the release—ease of use & enterprise readiness and how those are helping deliver transformational outcomes—while the second blog focused on data access innovation. In this final part, we […]

The post Introducing Availability of HDP 2.3 – Part 3 appeared first on Hortonworks.

Fault tolerant Nimbus in Apache Storm

$
0
0

Everyday more and more new devices—smartphones, sensors, wearables, tablets, home appliances—connect together by joining the “Internet of Things.” Cisco predicts that by 2020, there will be 50 billion devices connected to Internet of Things. Naturally, they all will emit streams of data, in short intervals. Obviously, these data streams will have to be stored, will […]

The post Fault tolerant Nimbus in Apache Storm appeared first on Hortonworks.

Apache Metron Use Case: Finding the Needle in the Haystack

$
0
0

In Part 3 of the Apache Metron announcement series, Apache Metron Tech Preview 1 – Come and Get it! , we outlined Apache Metron 0.1 release’s new features and enhancements. Then we demonstrated how to deploy on a single node VM using vagrant and a cloud-based install for a complete 10 node Metron cluster using […]

The post Apache Metron Use Case: Finding the Needle in the Haystack appeared first on Hortonworks.

What’s New in Apache Storm 1.0 – Part 1 – Enhanced Debugging

$
0
0

Debugging distributed systems can be difficult largely because they are designed to run on many (possibly thousands) of hosts in a cluster. This process typically involves monitoring and analyzing log files spread across the cluster, and if the necessary information is not being logged, service restarts and job redeployment may be required. Not only is […]

The post What’s New in Apache Storm 1.0 – Part 1 – Enhanced Debugging appeared first on Hortonworks.

Apache Metron Tech Preview 2 Available Now!

$
0
0

Accelerated Threat Triage and Expanded Deployment Options Two months ago, the Metron Engineering and PM team  released  Technical Preview 1 of Apache Metron based on the 0.1 release. We shared our vision for an open community based cybersecurity solution that provides real-time, cross-referenced and contextualized big data to combat cyber threats. Apache Metron Reference Architecture As the above […]

The post Apache Metron Tech Preview 2 Available Now! appeared first on Hortonworks.

Hive LLAP Technical Preview enables sub-second SQL on Hadoop and more

$
0
0

The most significant new feature in Apache Hive 2, to be included in the upcoming HDP 2.5 release is a technical preview of LLAP (Live Long and Process). LLAP enables as fast as sub-second SQL analytics on Hadoop by intelligently caching data in memory with persistent servers that instantly process SQL queries. Since LLAP is […]

The post Hive LLAP Technical Preview enables sub-second SQL on Hadoop and more appeared first on Hortonworks.

Top Articles and Questions from HCC last week

$
0
0

It has been another exciting week on Hortonworks Community Connection HCC. We have lots of great technical content and are continuing to see great activity. We recommend the following assets from last week: Top Articles from HCC Adding KDC Administrator Credentials to the Ambari Credential Store by:rlevas Rack Awareness by:rbiswas Spark+Pycharm+Pybuilder on Docker by:smanjee YARN […]

The post Top Articles and Questions from HCC last week appeared first on Hortonworks.


Top 5 Articles on Hadoop

$
0
0

It has been another exciting week on Hortonworks Community Connection HCC. We have lots of great technical content and are continuing to see great activity. We recommend the following assets from last week: Top Articles from HCC Disaster recovery and Backup best practices in a typical Hadoop Cluster :Series 1 Introduction by:rbiswas Disaster recovery plan […]

The post Top 5 Articles on Hadoop appeared first on Hortonworks.

Top 5 Articles and Questions from HCC

$
0
0

It has been another exciting week on Hortonworks Community Connection HCC. We continue to see great activity and recommend the following assets from last week. Top Articles from HCC Horses for Courses: Apache Spark Streaming and Apache Nifi by:vvaks Comparing Apache Nifi and Apache Spark Streaming for different streaming and IOT use cases Data Analysis […]

The post Top 5 Articles and Questions from HCC appeared first on Hortonworks.

Coming in HDP 2.5: Incremental Backup and Restore for Apache HBase and Apache Phoenix

$
0
0

The need to address Business Continuity and Disaster Recovery (BCDR) concerns is well known to anyone who runs production systems. This blog introduces HBase’s new backup and restore capabilities, which give HBase the ability to perform full and incremental backups across clusters and into the cloud. When combined with real-time replication, this new incremental backup […]

The post Coming in HDP 2.5: Incremental Backup and Restore for Apache HBase and Apache Phoenix appeared first on Hortonworks.

HCC — Top 5 Articles and Questions from the week

$
0
0

It has been another exciting week on Hortonworks Community Connection HCC. We continue to see great activity and recommend the following assets from last week. Top Articles from HCC Phoenix HBase Tuning – Quick Hits by:smanjee HBase tuning like any other service within the ecosystem requires understanding of the configurations and the impact (good or […]

The post HCC — Top 5 Articles and Questions from the week appeared first on Hortonworks.

Top Articles on HCC — Hortonworks Community

$
0
0

It has been another exciting week on Hortonworks Community Connection HCC. We continue to see great activity and recommend the following assets from last week. Top Articles from HCC HDF installation on EC2 by:mpandit Hortonworks DataFlow (HDF) powered by Apache NiFi, Kafka and Storm, collects, curates, analyzes and delivers real-time data from the IoAT to […]

The post Top Articles on HCC — Hortonworks Community appeared first on Hortonworks.

Viewing all 333 articles
Browse latest View live