Thursday, January 18, 2018
Home Tags Petabytes

Tag: petabytes

Enterprises that are embracing a cloud deployment need cost-effective and practical ways to migrate their corporate data into the cloud.

This is sometimes referred to as “hydrating the cloud.” Given the challenge of moving massive enterprise data sets anywhere non-disruptively and accurately, the task can be a lengthy, complicated, and risky process.Not every organization has enough dedicated bandwidth to transfer multi-petabytes without causing performance degradation to the core business, or spare hardware to migrate to the cloud.
In some cases, those organizations in a physically isolated location, or without cost-effective high-speed Internet connections, face an impediment to getting onto a target cloud.

Data must be secured, backed-up, and in the case of production environments, migrated at global enterprise scale without missing a beat.To read this article in full or to leave a comment, please click here
Man who asked to get back his sports videos never got a court hearing.
The boss’s boss looks out across the server farm and sees data—petabytes and petabytes of data.

That leads to one conclusion: There must be a signal in that noise.

There must be intelligent life in that numerical world—a strategy to monetize all those hard disks filling up with numbers.That job falls on your desk, and you must now find a way to poke around the digital rat’s nest and find a gem to hand the boss.[ Download the InfoWorld megaguide: The best Python frameworks and IDEs. | Learn to crunch big data with R. | Keep up with hot topics in programming with InfoWorld’s App Dev Report newsletter. ]How? If you’re a developer, there are two major contenders: R and Python.

There are plenty of other solutions that help crunch data, and they live under rubrics like business intelligence or data visualization, but they are often full-service solutions.
If they do what you want, you should choose them.

But if you want something different, well, writing your own code is the only solution.

Full-service tools do a good job when the data is cleaned, buffed, and ready, but they tend to hiccup and even throw up when everything is not quite perfect.To read this article in full or to leave a comment, please click here
High data density thanks to techniques developed for error-prone communication.
When there's a firehose of data pouring into a data center or cloud storage system, IT managers and storage admins must be prepared to handle it—especially when input spikes.

Data analytics, business metrics and instrumentation are completely changing the way enterprises do business in 2016, due to the preponderance of devices supplying all the additional data for the firehose.

Because enterprises now commonly deal with petabytes of data each year, it's becoming more labor-intensive and complicated to protect that data, even within the structured environment of a corporate database.
In fact, a recent survey of IT managers from Forrester Research found that 71 percent of enterprises continue to struggle with protecting their company's precious data in databases.
In this eWEEK slide show, we use industry information from Forrester Research and the open-source MariaDB community to compile the top eight best practices for protecting your data.
Given the pace at which big data software is released, coupled with the sheer volume of data under management, the big data market is ripe for massive security breaches.
It’s only a matter of time. In fact, as a Gartner survey last year uncovered, very few companies have taken security seriously for essential infrastructure like Hadoop.

At that time, a mere 2 percent of respondents cited Hadoop security as a significant concern, causing Gartner analyst Merv Adrian to exclaim, “The nearly non-existent response to the security issue is shocking.” CIOs, in other words, may be willing to close their eyes and pray for big data security, but until they make it a priority, such “prayers” are vain. What, me worry? For years enterprises have taken a somewhat blase approach to security in big data infrastructure such as Hadoop, despite the size of big data leading to “origins [that] are not consistently monitored and tracked.” In early 2014, Adrian, noting a lack of interest in Hadoop security, queried, “Can it be that people believe Hadoop is secure? Because it certainly is not.

At every layer of the stack, vulnerabilities exist, and at the level of the data itself there numerous concerns.” A year later, Adrian’s colleague, Nick Heudecker, lamented, “Less than 5 percent of Hadoop inquiries covered by the Info Mgmt team in 2014 discussed security.

This has to change in 2015." It didn’t -- not much, anyway.

For example, one security engineer, Ray Burgemeestre, suggested that more and more people are asking, “After enabling all security settings in Hadoop/Spark, how would I know my cluster is actually secure?” The answer, he acknowledged, is “not completely satisfying,” insisting that “more work needs to be done in the Hadoop community to raise its security profile.” Another interested participant in Hadoop security, Bolke de Bruin, Head of Research & Development for ING bank, indicates that while the Hadoop community is increasingly aware of the need to protect data confidentiality within Hadoop clusters, it continues to give limited attention to data integrity (“maintaining and assuring the accuracy and completeness of data over its entire lifecycle”). He goes on to note that even the security native to Hadoop often doesn’t get implemented due to “perceived complexity” or is purposefully ignored because things like Apache Ranger are “slapped on security” that are “usable, but barely.” Worried yet? Hadoop is the godfather of big data infrastructure, with the most time and attention paid to it over the past few years.
If it can’t muster sufficient security, despite petabytes of sensitive data pouring into its clusters, then we have a very serious security problem across the board. Who has time? The problem is time ... or, rather, the lack thereof. As MobileIron highlights in a recent report on mobile security, “[W]ith any software, the longer it is in market, the more likely it is that vulnerabilities will be identified.” This should be particularly true of open source software, which offers the ability to dig into source code before or (more likely) after vulnerabilities emerge. The big data infrastructure market, however, doesn’t sit still long enough for these vulnerabilities to be found.
Indeed, in a December 2015 Gartner report, the authors advise enterprise buyers: “Don't base Hadoop assessment on analysis or trials more than a year old; existing pieces are maturing and new ones are emerging at a rapid pace.” While that “rapid pace” may sound great (innovation ftw!), it’s also ripe for security problems, as mentioned.

As Adrian warns, “We will see major problems as Hadoop goes mainstream.” And not only Hadoop: as enterprises build on Hadoop, Spark, Kafka, and a host of other exceptional, fast-moving data infrastructure, “[W]e are building skyscraper favelas in code -- in earthquake zones,” as Zeynep Tufekci has detailed. In response, we are already seeing the Hadoop vendors like Cloudera and Hortonworks seek to differentiate themselves based on security.
I suspect we’ll see this enterprise-grade security come with an enterprise-grade price tag, but it will be worth it.
Empowers customers to fight back by detecting malicious activity as it appears on the InternetSan Francisco – July 28, 2016 – RiskIQ, the leader in external threat management, today announced general availability for its Security Intelligence Services, a ground-breaking new product that uses the Internet itself as a detection system to automatically defend a network from cyber attacks.

Attackers use automation and can launch sophisticated attacks at very low cost by rotating and reusing undetected infrastructure. RiskIQ has provided defenders with access to Internet datasets, advanced analytics and machine learning to stay one step ahead. With Security Intelligence Services, RiskIQ now detects unknown threats at the source and tracks how attacks change and spread—in real-time. “The security team’s visibility is mostly based on what they see on the corporate network but once they detect a threat locally, the attacker has already moved —this fact limits defenders’ efficacy—they are always playing catch up,” said Arian Evans, VP of Product Strategy at RiskIQ. “Using the Internet as a replacement for the corporate network, we provide real-time information on the attacker as soon as their attack goes live or moves.” With thousands of customers and processing petabytes of Internet datasets daily, RiskIQ is a pioneer in expanding the reach of the security program to prevent attacks.

The comprehensive service includes: Passive DNS (PDNS) data, a system of record that stores DNS resolution for a given domain or IP address, provides security analysts with insight into how a particular domain name or IP address changes over time. RiskIQ’s implementation of PDNS enables programmatic links between related domains/IP addresses and, when researching an event, can provide context to an attack or additional malicious domains/IP addresses. PDNS helps identify the indicator of compromise through correlation of historical resolution lookups, time-based analysis, and fully qualified domain name lookups. WHOIS data, an internet database of ownership information about a domain, IP address or subnet, can give an organization insight into those behind an attack campaign. WHOIS data helps determine the maliciousness of a given domain or IP address based on ownership records. Using domain registration information, an organization can unmask an attacker’s infrastructure by linking a suspicious domain to other domains registered using the same or similar information. RiskIQ Attack Analytics, a proprietary RiskIQ dataset, is based on malicious observations inside of real-time Internet datasets.

As attacks evolve and propagate outside of your network, RiskIQ behavioral analytics identifies cyber threats and provides customers with filtered lists of known bad hosts, domains, IPs and URLs.

These feeds allow any enterprise security organization to leverage RiskIQ’s vast Internet datasets and expertise to proactively defend their environment’s networks or endpoints from threats. Newly Observed Domains, the first of our attack analytics feeds, is a proprietary enriched RiskIQ dataset containing newly resolving domains.

Threat actors often programmatically use different domains for their attack campaigns, therefore newly active domains can serve as a guide to whether a domain is legitimate or not. RiskIQ’s continually updated Newly Observed Domains provides customers with near real-time intelligence of domains seen for the first time. Organizations can proactively defend against new domains that could be hosting phishing sites, distributing or operating malware or posing other cyber threats by blocking newly observed domains for a specified time period based on policy and risk tolerance. "To solve this incredibly difficult problem, RiskIQ has assembled the only complete source of real-time Internet datasets combined with the machine learning and analytics capable of generating truly predictive results," continued Arian Evans, VP of Product Strategy at RiskIQ. “Security Intelligence Services is a major innovation for threat detection—finding threats first using the Internet as a sensor and then using automation to inform the corporate network to block, thereby freeing up resources and increasing the cost to attackers to launch further attacks—in this current state of rapidly morphing threats." Customers can access RiskIQ Security Intelligence Services through a sandbox to test data structures and explore information via a user-friendly interactive application programming interface (API) and documentation.

Data from RiskIQ Security Intelligence Services can then be easily integrated with commonly used security platforms to investigate and protect against threats such as: Advanced persistent threats (APT)/Malware hosting and distribution Phishing, spear phishing and whaling Domain name abuse/Copycat domains Email abuse Watering holes Malvertising For pricing inquiries, please contact sales at RiskIQ.
Security Intelligence Services is available on the RiskIQ website at About RiskIQRiskIQ is a cybersecurity company that helps organizations discover and protect their external facing known, unknown and third-party web, mobile and social digital assets.

The company’s External Threat Management platform combines a worldwide proxy and sensor network with synthetic clients that emulate users to monitor, detect and take actions against threats. RiskIQ is being used by thousands of companies including F500s and leading financial institutions to protect their web assets and users from external security threats.
It is headquartered in San Francisco and backed by growth equity firms Summit Partners and Battery Ventures. To learn more about RiskIQ, visit