Tuesday, April 7, 2015

Now that many enterprises are seeing value in big data analysis, it may be time for their database administrators and data warehouse managers to get involved.
Oracle has released a new extension for its Oracle Data Integrator middleware that allows DBAs and data warehouse experts to treat big data repositories as just another data source, alongside their structured databases and data warehouses.
[ Explore the current trends and solutions in BI with InfoWorld's Big Data Analytics Deep Dive and Extreme Analytics blog. ] The Oracle Data Integrator for Big Data "makes a non-Hadoop developer instantly productive on Hadoop," said Jeff Pollock, Oracle vice president of product management.
Big data platforms such as Hadoop and Spark were initially geared more towards programmers than DBAs, using languages such as Java and Python, Pollock said. Yet traditional enterprise data analysis has largely been managed by DBAs and experts in ETL (Extract Transform and Load Tools), using tools such as SQL and drag-and-drop visually-oriented interfaces.
The Data Integrator for Big Data extends Oracle's ODI product to handle big data sources.
ODI provides the ability for organizations to pull together data from multiple sources and formats, such as relational data hosted in IBM or Microsoft databases, and material residing in Teradata data warehouses. So it was a natural step to connect to big data repositories to ODI as well.
With the extension, "you don't have to retrain a database administrator on Hive for Hadoop. We can now give them a toolkit that they will be naturally familiar with," Pollock said. The administrator can work with familiar concepts such as entities and relations, and 4GL data flow mapping. The software "automatically generates the code in the different underlying languages," needed to complete the job, Pollock said.
The software can work with any Hadoop or Spark deployment, and doesn't require software installation on any of the data nodes. Using the power of distributed computing, Data Integrator for Big Data uses the nodes where the data is stored to carry out all the computations needed.
A retail organization could use the software to analyze its customers' purchasing histories. Real-time data capture systems such as Oracle GoldenGate 12c could move transactional data into a Hadoop cluster, where it then can be prepared for analysis by ODI.
Oracle is not alone in attempting to bridge the new big data tools with traditional data analysis software. Last week, Hewlett-Packard released a software package that allows customers to integrate HP's Vertica analysis database with HP Autonomy's IDOL (Intelligent Data Operating Layer) platform, providing a way for organizations to speedily analyze large amounts of unstructured data.

Dawn of the data center operating system

Virtualization has been a key driver behind every major trend in software, from search to social networks to SaaS, over the past decade. In fact, most of the applications we use -- and cloud computing as we know it today -- would not have been possible without the server utilization and cost savings that resulted from virtualization.

But now, new cloud architectures are reimagining the entire data center. Virtualization as we know it can no longer keep up.

As data centers transform, the core insight behind virtualization -- that of carving up a large, expensive server into several virtual machines -- is being turned on its head. Instead of divvying the resources of individual servers, large numbers of servers are aggregated into a single warehouse-scale (though still virtual!) “computer” to run highly distributed applications.

Every IT organization and developer will be affected by these changes, especially as scaling demands increase and applications get more complex every day. How can companies that have already invested in the current paradigm of virtualization understand the shift? What’s driving it? And what happens next?

Virtualization then and now

Perhaps the best way to approach the changes happening now is in terms of the shifts that came before it -- and the leading players behind each of them.

That story begins in the mainframe era, with IBM. Back in the 1960s and 1970s, the company needed a way to cleanly support older versions of its software on newer-generation hardware and to turn its powerful computers from a batch system that ran one program at a time to an interactive system that could support multiple users and applications. IBM engineers came up with the concept of a “virtual machine” as a way to carve up resources and essentially timeshare the system across applications and users while preserving compatibility.

This approach cemented IBM’s place as the market leader in mainframe computing.

Fast-forward to the early 2000s and a different problem was brewing. Enterprises were faced with data centers full of expensive servers that were running at very low utilization levels. Furthermore, thanks to Moore’s Law, processor clock speeds had doubled every 18 months and processors had moved to multiple cores -- yet the software stack was unable to effectively utilize the newer processors and all those cores.

Again, the solution was a form of virtualization. VMware, then a startup out of Stanford, enabled enterprises to dramatically increase the utilization of their servers by allowing them to pack multiple applications into a single server box. By embracing all software (old and new), VMware also bridged the gap between the lagging software stack and modern, multicore processors. Finally, VMware enabled both Windows and Linux virtual machines to run on the same physical hosts -- thereby removing the need to allocate separate physical servers to those clusters within the same data center.

Virtualization thus established a stranglehold in every enterprise data center.

But in the late 2000s, a quiet technology revolution got under way at companies like Google and Facebook. Faced with the unprecedented challenge of serving billions of users in real time, these Internet giants quickly realized they needed to build custom-tailored data centers with a hardware and software stack that aggregated (versus carved) thousands of servers and replaced larger, more expensive monolithic systems.

What these smaller and cheaper servers lacked in computing power they made up in number, and sophisticated software glued it all together to build a massively distributed computing infrastructure. The shape of the data center changed. It may have been made up of commodity parts, but the results were still orders of magnitude more powerful than traditional, state-of-the-art data centers. Linux became the operating system of choice for these hyperscale data centers, and as the field of devops emerged as a way to manage both development and operations, virtualization lost one of its core value propositions: the ability to simultaneously run different “guest” operating systems (that is, both Linux and Windows) on the same physical server.

Microservices as a key driver

But the most interesting changes driving the aggregation of virtualization are on the application side, through a new software design pattern known as microservices architecture. Instead of monolithic applications, we now have distributed applications composed of many smaller, independent processes that communicate with each other using language-agnostic protocols (HTTP/REST, AMQP). These services are small and highly decoupled, and they're focused on doing a single small task.

Microservices quickly became the design pattern of choice for a few reasons.

First, microservices enable rapid cycle times. The old software development model of releasing an application once every few months was too slow for Internet companies, which needed to deploy new releases several times during a week -- or even on a single day in response to engagement metrics or similar. Monolithic applications were clearly unsuitable for this kind of agility due to their high change costs.

Second, microservices allow selective scaling of application components. The scaling requirements for different components within an application are typically different, and microservices allowed Internet companies to scale only the functions that needed to be scaled. Scaling older monolithic applications, on the other hand, was tremendously inefficient. Often the only way was to clone the entire application.

Third, microservices support platform-agnostic development. Because microservices communicate across language-agnostic protocols, an application can be composed of microservices running on different platforms (Java, PHP, Ruby, Node, Go, Erlang, and so on) without any issue, thereby benefiting from the strengths of each individual platform. This was much more difficult (if not impractical) to implement in a monolithic application framework.

Delivering microservices

The promise of the microservices architecture would have remained unfulfilled in the world of virtual machines. To meet the demands of scaling and costs, microservices require both a light footprint and lightning-fast boot times, so hundreds of microservices can be run on a single physical machine and launched at a moment’s notice. Virtual machines lack both qualities.

That’s where Linux-based containers come in.

Both virtual machines and containers are means of isolating applications from hardware. However, unlike virtual machines -- which virtualize the underlying hardware and contain an OS along with the application stack -- containers virtualize only the operating system and contain only the application. As a result, containers have a very small footprint and can be launched in mere seconds. A physical machine can accommodate four to eight times more containers than VMs.

Containers aren’t actually new. They have existed since the days of FreeBSD Jails, Solaris Zones, OpenVZ, LXC, and so on. They’re taking off now, however, because they represent the best delivery mechanism for microservices. Looking ahead, every application of scale will be a distributed system consisting of tens if not hundreds of microservices, each running in its own container. For each such application, the ops platform will need to keep track of all of its constituent microservices -- and launch or kill those as necessary to guarantee the application-level SLA.

Why we need a data center operating system

All data centers, whether public or private or hybrid, will soon adopt these hyperscale cloud architectures -- that is, dumb commodity hardware glued together by smart software, containers, and microservices. This trend will bring to enterprise computing a whole new set of cloud economics and cloud scale, and it will introduce entirely new kinds of businesses that simply were not possible earlier.

What does this mean for virtualization?

Virtual machines aren’t dead. But they can’t keep up with the requirements of microservices and next-generation applications, which is why we need a new software layer that will do exactly the opposite of what server virtualization was designed to do: Aggregate (not carve up!) all the servers in a data center and present that aggregation as one giant supercomputer. Though this new level of abstraction makes an entire data center seem like a single computer, in reality the system is composed of millions of microservices within their own Linux-based containers -- while delivering the benefits of multitenancy, isolation, and resource control across all those containers.

data center OS

Think of this software layer as the “operating system” for the data center of the future, though the implications of it go beyond the hidden workings of the data center. The data center operating system will allow developers to more easily and safely build distributed applications without constraining themselves to the plumbing or limitations (or potential loss) of the machines, and without having to abandon their tools of choice. They will become more like users than operators.

This emerging smart software layer will soon free IT organizations -- traditionally perceived as bottlenecks on innovation -- from the immense burden of manually configuring and maintaining individual apps and machines, and allow them to focus on being agile and efficient. They too will become more strategic users than maintainers and operators.

The aggregation of virtualization is really an evolution of the core insight behind virtual machines in the first place. But it’s an important step toward a world where distributed computing is the norm, not the exception.

Sudip Chakrabarti is a partner at a16z where he focuses on infrastructure software, security, and big data investments. Peter Levine is a general partner at Andreessen Horowitz. He has been a lecturer at both MIT and Stanford business schools and was the former CEO of XenSource, which was acquired by Citrix in 2007. Prior to XenSource, Peter was EVP of Strategic and Platform Operations at Veritas Software. Follow him on his blog, http://peter.a16z.com/, and on Twitter @Peter_Levine.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Microsoft closes acquisition of R software and services provider

Microsoft today closed its acquisition of Revolution Analytics, a commercial provider of software and services for the R programming language, making it a wholly owned subsidiary.

"R is the world's most popular programming language for statistical computing and predictive analytics, used by more than 2 million people worldwide," says Joseph Sirosh, corporate vice president of Information Management & Machine Learning at Microsoft.

[ Go deep with R: Sharon Machlis reveals R data manipulation tricks at your fingertips. And read her beginner's guide to R | Matt Asay explores whether Microsoft can make R easy . ]
"Revolution has made R enterprise-ready with speed and scalability for the largest data warehouses and Hadoop systems," he adds. "For example, by leveraging Intel's Math Kernel Library (MKL), the freely available Revolution R Open executes a typical R benchmark 2.5 times faster than the standard R distribution and some functions, such as linear regression, run up to 20 times faster. With its parallel external memory algorithms, Revolution R Enterprise is able to deliver speeds 42 times faster than competing technology from SAS."

[Related: Learn R for Beginners With Our PDF ]

Microsoft announced its plans to acquire Revolution Analytics in January, citing its desire to help use the power of R and data science to unlock insights with advanced analytics.

With the acquisition now closed, Sirosh says Microsoft plans to build R and Revolution's technology into its data platform products, making it available on-premises, on Azure public cloud environments and in hybrid environments.

[ Related: 60+ R Resources to Improve Your Data Skills ]

"For example, we will build R into SQL Server to provide fast and scalable in-database analytics that can be deployed in an enterprise customer's datacenter, on Azure or in a hybrid combination," Sirosh says.

"In addition, we will integrate Revolution's scalable R distribution into Azure HDInsight and Azure Machine Learning, making it much easier and faster to analyze big data, and to operationalize R code for production purposes," Sirosh says. "We will also continue to support running Revolution R Enterprise across heterogeneous platforms including Linux, Teradata and Hadoop deployments. No matter where data lives, customers and partners will be able to take advantage of R more quickly, simply and cost effectively than ever before."

Open sources loves its R

Sirosh adds that Microsoft considers the active and passionate open source community around R an essential element to the programming language's success, and it plans to "support and amplify" Revolution's open source projects, including the Revolution R Open distribution, the ParallelR collection of packages for distributed programming, Rhadoop for running R on Hadoop nodes, DeployR for deploying R analytics in web and dashboard applications, the Reproducible R Toolkit and RevoPemaR for writing parallel external memory algorithms.

Microsoft also plans to continue Revolution's education and training efforts around R, and Sirosh notes it will leverage its global programs and partner ecosystem to do so.

Revolution Analytics CEO Dave Rich is now general manager of Advanced Analytics at Microsoft.

[Related: Learn to Crunch Big Data with R ]

"The CIO and CDO will need an easy-to-use, integrated platform and a vendor partner who simultaneously understands end-user productivity, cloud computing and data platforms," Rich says, describing the "Decision Process Engineering" that he sees dominating the next decade. "Who better to deliver this to companies large and small than Microsoft? All Microsoft needed was a bridge to crowd-sourced innovation on the advanced analytics algorithms and tools power results from big data. Who better than Revolution Analytics? Stay tuned. Now it gets interesting."

Follow Thor on Google+

This story, "Microsoft closes acquisition of R software and services provider" was originally published by CIO.

Fast and effective malware detection -- for free

Ever discover a site or a service that's brand-new and cool, only to learn it’s been around for years? No, I’m not talking about cat videos. I'm referring to the awesome, free malware analysis site Malwr.

It’s been around since January 2011 and is based on the popular open source analysis software Cuckoo. Malwr takes Cuckoo’s sandbox, throws a front end on it, and adds other related features. I’m not sure if the malware analysis teams at the leading antivirus firms use it (my guess is they have more sophisticated, expensive analysis tools at their disposal), but Malwr is good enough for any disassembling hobbyist. Claudio Guarnieri and Alessandro Tanasi -- respectively, chairman and director of the Netherlands-based Cuckoo Foundation -- created and operate Malwr.

[ Watch out for 11 signs you've been hacked -- and learn how to fight back, in InfoWorld's PDF special report. | Discover how to secure your systems with InfoWorld's Security newsletter. ]
I heard that Malwr got overwhelmed a while ago, running out of resources due to an abundance of users. Now it runs on systems provided by the long-trusted Shadowserver Foundation.

To use it, go to malwr.com and choose the Submit option from the top of the page. Then browse to your malware sample, upload it for inspection, type in the mathematical answer to a Turing test, and click on Analyze.

You can then pore through the results. The analysis includes:

Hash fingerprinting results
Submission to Virustotal.com
Screenshots of the program during execution and installation
Static analysis
Dynamic analysis
Behaviors
Domains contacted
Hosts contacted
Whether the program makes itself autorun on Window systems
Registry keys created
Files dropped
Mutexes created
Files and registry keys queried, failures, and successes
Network activity
HTTPS packets generated

There's a whole lot more. I was delighted to see the level of information delivered. It’s definitely enough to determine if the program in question is doing something shady or unexpected. It’s not perfect -- and malware is often written specifically to hide bad behaviors from tools like Malwr -- but it’s 100 times faster than trying to do the analysis on your own.

I downloaded a suspicious “registry cleaner” to analyze. Here are some screenshots from the results:

Malwr malware detection 1

In this case, I didn’t see anything that jumped out as malicious, but I saw enough that I didn’t want to run it, including the report that TrendMicro labels it as "suspicious." What bothered me more was that it tried to create a file, netmsg.dll, in my System32 folder. There are a million reasons why that would be normal, but I didn’t like seeing it from a newly installed registry cleaner program, most of which are full of rogue code anyway.

It was great that I didn’t have to run the malware sample on my own desktop, although I could have done so safely in a newly created VM and installed additional monitoring tools -- or even used Cuckoo. Instead, I selected the file, uploaded to Malwr, and waited one or two minutes while it did all the hard work -- no setup or configuration, no sweat, and no messy cleanup, one and done. I love it.

Though I’m late to the discovery, I know for sure that Malwr will be one of my go-to tools -- along with Sysinternals Processor Explorer and Virustotal.com -- for a long time.

Saturday, March 28, 2015

The amazing algorithm behind Google Maps and Apple Maps

google-maps-iphone-ios

Even before the advent of Google Maps, folks were using programs like MapQuest to print up directions and figure out the shortest route between any two locations. While it’s easy to take mapping apps for granted these days, there’s some interesting mathematical algorithms at work behind the scenes that make it all possible.
Not many people are aware of this, but the computer algorithm that makes mapping programs so convenient dates all the way back to 1956, when a programmer named Edsger W. Dijkstra needed to come up with a solvable problem as a means to showcase the power of a new ARMAC computer. Dijkstra himself is a bit of a computing legend, having received the Turing Award in 1972.
Related: Free Gmail plugin lets you see if your emails are being tracked before you open them
Seeking to come up with a relatable problem, Dijikstra settled on “the shortest way to travel from Rotterdam to Groningen.”
VICE reports:

“For a demonstration for noncomputing people you have to have a problem statement that non-mathematicians can understand,” Dijkstra recalled in an interview not long before his 2002 death. “They even have to understand the answer. So I designed a program that would find the shortest route between two cities in the Netherlands, using a somewhat reduced road-map of the Netherlands, on which I had selected 64 cities.”

The algorithm underpinning Dijikstra’s work, and indeed, the basic mapping functionality in many programs, was something he said he came up with while casually drinking coffee. The algorithm itself was highlighted in a published paper from 1959 and is appropriately called Dijkstra’s algorithm.
Notably, Dijkstra’s algorithm has applications beyond traditional navigation. It’s also been used for things like urban planning, network routing protocols, and optimal chip design.

Google Clarifies The Mobile-Friendly Algorithm

Google Clarifies The Mobile-Friendly Algorithm Will Roll Out Over A Week, Be A ... - Search Engine Land

google-mobile-responsive-design7-ss-1920

Google’s mobile-friendly ranking algorithm that is launching on April 21st will be on a page-by-page and real-time basis but how long will it take to roll out and how do you know if your page qualifies to benefit from it?
Since we know this algorithm will be significantly larger in impact compared to the Panda and Penguin algorithms, webmasters are kind of anxious about the release.
Yesterday, Google answered a series of questions in a Google+ hangout on the topic of this new mobile-friendly ranking algorithm. The three things we learned were:
(1) The algorithm will start rolling out on April 21st and will take a few days to a week to completely and globally.
(2) You are either mobile-friendly or not, there are no degrees of mobile-friendliness in this algorithm.
(3) The fastest way to see if your web pages are mobile-friendly is to see if you have the mobile-friendly label in the live mobile search results now. If not, check the mobile-friendly testing tool, which should match the live Google search results, whereas the mobile usability reports in Google Webmaster Tools can be delayed based on crawl time.

Roll Out Will Be A Few Days To A Week

I transcribed Google’s Mary response on this where she said:

We are expecting it (the mobile friendly algorithm) to roll out on April 21st, we don’t have a set time period because it is going to take a couple of days to roll out. Maybe even a week or so.

Your Page Is Mobile-Friendly Or Not

The mobile-friendly algorithm is an on or off algorithm, on a page-by-page basis, but it is not about how mobile-friendly your pages are, it is simply are you mobile-friendly or not. I transcribed this one also:

As we mentioned in this particular change, you either have a mobile friendly page or not. It is based on the criteria we mentioned earlier, which are small font sizes, your tap targets/links to your buttons are too close together, readable content and your viewpoint. So if you have all of those and your site is mobile friendly then you benefit from the ranking change.
But as we mentioned earlier, there are over 200 different factors that determine ranking so we can’t just give you a yes or no answer with this. It depends on all the other attributes of your site, weather it is providing a great user experience or not. That is the same with desktop search, not isolated with mobile search.

How Do You Know You Are Mobile-Friendly

How do you know if your web pages will be mobile-friendly or not? There are a few ways, but Google said the easiest way is to see if your current pages have the mobile-friendly label in the live mobile search results now. If so, the mobile-friendly testing tool should also confirm this. Keep in mind, the a href=”http://searchengineland.com/mobile-usability-reports-come-google-webmaster-tools-206885″>mobile usability reports in Webmaster Tools can be delayed by crawl time and general webmaster tools reporting delays.
I transcribed the three times Google answered this but I’ll share one here:

Take out your phone, look up your web site. See if there is a gray mobile friendly label in your description snippet. If it is in the search results, if you see it, that means that Google understands that your site is mobile friendly and if you don’t see it then we don’t see that your site is mobile friendly or your is not mobile friendly.

Join the most accomplished search marketers in the world at SMX Advanced. Conference sessions and keynotes are designed for sophisticated search engine marketers. You'll choose from in depth sessions on organic search engine optimization (SEO), paid search advertising (PPC), and more. Lowest rates expire March 28. Register Now!

(Some images used under license from Shutterstock.com.)

Everything you need to know about SEO, delivered every Thursday

Friday, March 27, 2015

Little Fanfare as Yahoo Leaves China

Yahoo's decision to close its last remaining operations in China – a research and development center in Beijing – will have little impact on the country's digital industry, experts say, as the company has been largely dormant in the region for the past several years.
Yahoo China has had limited operations in China since it was acquired by Alibaba as part of a strategic partnership with Yahoo Inc. in 2005. In 2013, the www.yahoo.cn Web portal was closed and users were given the option to transfer their existing Yahoo China mail accounts to other Yahoo sites or to Alibaba's Alimail platform.
Foreign technology and social media companies, including Google, Facebook, Twitter, and Instagram, have all faced a hostile regulatory and operating environment in China, as well as competition from local tech giants Alibaba, Baidu, Tencent, and Weibo, but experts say Yahoo's decision to pull out of China has just as much to do with cutting costs.
"If they are closing the office, it's because they no longer have user interest and the business is falling off," says one insider, who requested to remain anonymous.

"It's no big deal because nobody cares about Yahoo in China," says another anonymous source. "It's a facility for research and development only, with no actual visibility. The Yahoo brand has been dumped in China since the operation was taken over by Alibaba many years ago."
But from a search industry history point of view, Yahoo's China departure is significant, says Motoko Hunt, president and search marketing consultant at AJPR.
"Yahoo was one of the first well-organized portal sites with a search function for people to get to know the World Wide Web until Baidu and other locally grown search engines and portal sites came out," says Hunt.
"While the operation may have been passed onto Alibaba, the office closure symbolizes the end of an era, and shows how difficult it is for Western businesses to be successful in the Chinese market."
Hunt adds the closure emphasizes the difficulties for foreign companies, especially Western companies, to understand, localize, and adapt to Chinese and other Asian regional markets.
"I give credit to Yahoo that it tried to do that over the years, but it shows how difficult it is for a local team to be heard within a huge corporation to get what they need to grow market share."
In a statement, Yahoo said it was constantly making changes to align resources, and to foster better collaboration, and innovation across its business.
"We currently do not offer local product experiences in Beijing but the office has served as a research and development center. We will be consolidating certain functions into fewer offices, including to our headquarters in Sunnyvale, California, U.S," the statement said.
The Beijing office closure, which will see its estimated 200 to 300 employees laid off, forms part of a number of cost-cutting redundancies across Yahoo's global network over the past six months.

star

Tuesday, April 7, 2015

Dawn of the data center operating system

Virtualization then and now

Microservices as a key driver

Delivering microservices

Why we need a data center operating system

Microsoft closes acquisition of R software and services provider

Open sources loves its R

Fast and effective malware detection -- for free

Saturday, March 28, 2015

The amazing algorithm behind Google Maps and Apple Maps

The amazing algorithm behind Google Maps and Apple Maps

Google Clarifies The Mobile-Friendly Algorithm

Google Clarifies The Mobile-Friendly Algorithm Will Roll Out Over A Week, Be A ... - Search Engine Land

Roll Out Will Be A Few Days To A Week

Your Page Is Mobile-Friendly Or Not

How Do You Know You Are Mobile-Friendly

(Some images used under license from Shutterstock.com.)

Friday, March 27, 2015

Little Fanfare as Yahoo Leaves China

Little Fanfare as Yahoo Leaves China

ClickBank RSS Feeds from ClickBank Analytics

ClickBank RSS Feeds from ClickBank Analytics @ CBtrends.com

About Me