Helpful article on Apache Spark with use cases

March 4, 2017 Comments Off on Helpful article on Apache Spark with use cases

I recently wrote Spark for Dummies in partnership with IBM. For those curious about this highly interesting and innovative technology – and the numerous scenarios where it can add value – there are increasing numbers of helpful online resources. A good example of what I mean is a recent article by Radek Ostrowski from Toptal. He provides a concise Spark overview, along with some sample use cases.

toptal1

I’ll continue to cross reference online resources like this as I run across them. If you’d like to read more about all things Big Data, be sure to check out some of my other related postings.

Spark for Dummies is now available

January 1, 2017 Comments Off on Spark for Dummies is now available

A few years ago, I wrote Hadoop for Dummies, which presented an executive-level overview of Hadoop, its capabilities, and its amazing potential. Since then, the Big Data world has continued its relentless march forward, with Apache Spark serving as one of the most exciting and well-adopted new technologies. I’m happy to announce that I’ve just written a companion book dedicated to Spark. Here are the major topics that I cover in this book:

  • Spark’s history
  • How it works
  • Why it’s such an important Big Data breakthrough
  • Real world use cases
  • How Spark, MapReduce, and Hadoop can work together
  • How you can deploy it in your enterprise
  • Best practices

 

 

spark

You can download a copy here.

 

This book – which was sponsored by IBM – is a great example of the kinds of high quality technical marketing content developed at Think88.  These include white papers, case studies, evaluation guides, technical presentations, and market research.

Presenting a Webinar on Delivering Data Security with Hadoop and the IoT

July 18, 2016 Comments Off on Presenting a Webinar on Delivering Data Security with Hadoop and the IoT

On August 9, I’ll be teaming with Reiner Kappenberger from Hewlett Packard Enterprise to explore some of the most pressing security implications of Hadoop and the Internet of Things (IoT). Hosted by the IT GRC Forum, here’s what we’ll be covering:

The Internet of Things (IoT) is here to stay, and Gartner predicts there will be over 26 billion connected devices by 2020. This is driving an explosion of data which offers tremendous opportunity for organizations to gain business value, and Hadoop has emerged as the key component to make sense of the data and realize the maximum value. On the flip side the surge of new devices has increased potential for hackers to wreak havoc, and Hadoop has been described as the biggest cybercrime bait ever created.

Data security is a fundamental enabler of the IoT, and if it is not prioritized the business opportunity will be undermined, so protecting company data is more urgent than ever before. The risks are huge and Hadoop comes with few safeguards, leaving it to organizations to add an enterprise security layer. Securing multiple points of vulnerability is a major challenge, although when armed with good information and a few best practices, enterprise security leaders can ensure attackers will glean nothing from their attempts to breach Hadoop. In this webinar we will discuss some steps to identify what needs protecting and apply the right techniques to protect it before you put Hadoop into production.

If you’d like to join us, register here.

 

Hadoop Buyers Guide is now available

November 25, 2013 Comments Off on Hadoop Buyers Guide is now available

Choosing a Hadoop platform can be confusing: there are several great alternatives on the market right now. Some of these offerings require you to handle all aspects of installation, configuration, and administration on your own, while others deliver a more comprehensive, innovative, and integrated solution yet are still faithful to Hadoop’s open source heritage.

I recently put together a concise eBook that you can use to help get a better understanding of your options.  You can view the guide here.

Introducing a half-day Big Data security training class

August 4, 2013 Comments Off on Introducing a half-day Big Data security training class

Beginning on September 20, I’ll be teaching a half-day Big Data security Webinar. These classes will take place once a month, and will cover the following topics:

Big Data information categories

  • Relational
  • Columnar/analytics
  • Key/value
  • Document store
  • Graph
  • XML
  • NoSQL

Big Data security requirements

  • Legal and regulatory
  • Internal guidelines
  • Industry standards
  • Privacy
  • User access

Big Data security risks

  • Meta data
  • Outsourcing
  • Distributed processing (e.g. MapReduce, Hadoop, and Cassandra)
  • Overt attacks
  • Covert attacks

Best practices for securing Big Data

  • Setting realistic security goals
  • Reducing surface area for attacks
  • Protecting physical assets
  • Safeguarding the network
  • Encrypting data
  • Data obfuscation via tokenization and masking
  • Retiring data

To allow for maximum student interaction, classes will be limited to 10 people. You can register here

Amazon Redshift – interesting new Big Data Warehouse As A Service offering

March 31, 2013 Comments Off on Amazon Redshift – interesting new Big Data Warehouse As A Service offering

A while back, I co-authored a White paper about the various storage options offered by Amazon Web Services (AWS). In that paper, I described each of the key AWS data management products (e.g. Simple Storage Service (S3), Elastic Block Storage (EBS), SimpleDB, and so on). I also provided some use cases showing how all of these services can work together.

Amazon continually upgrades and improves its offerings, and they’ve recently announced beta availability for Redshift – a cloud-based service that lets you use the AWS infrastructure to run analytics and business intelligence applications on terabytes – or even petabytes – of information that you upload to Amazon.

In future posts, I’ll be writing more about how these types of hosted solutions have the potential to level the playing field and transform the way Big Data analytics are handled by organizations of all sizes. For now, I encourage you to consider cloud-based products for even the largest business intelligence applications.

Teaching a workshop entitled “Foundations of Big Data from A to Z”

February 16, 2013 Comments Off on Teaching a workshop entitled “Foundations of Big Data from A to Z”

Recently, I blogged about a talk I’m giving in Boston at the Conference on Big Data Security. I’m happy to announce that I’ll also be teaching a comprehensive one-day workshop on Big Data. Here’s what I’ll be covering on Tuesday, July 16:

  • A realistic, vendor-agnostic overview of the current Big Data security landscape
  • Big Data information management categories including: in-memory databases, key/value stores, graph databases, and file/object repositories
  • Examination and explanation of the most widespread technologies such as Amazon Web Services, Big Table, and Hadoop
  • Understanding how all of these disparate solutions co-exist without security chaos
  • Pinpointing the intrinsic non-technical security risks present in a big data environment: regulatory, legal, industry, and Service Level Agreements
  • Creating a “defense-in-depth” approach to protecting Big Data for your shop
  • Real-world scenarios on what works and why

If you’re interested, you can register here.

Where Am I?

You are currently browsing entries tagged with Hadoop at rdschneider.

%d bloggers like this: