7 | ORACLE ENTERPRISE ARCHITECTURE WHITE PAPER — AN ENTERPRISE ARCHITECT’S GUIDE TO BIG DATA. Areas. Questions. Possible Answers. Roadmap. Proof of .

49 KB – 49 Pages

PAGE – 2 ============
ORACLE ENTERPRISE AR CHITECTURE WHITE PAP ER AN ENTERPRISE ARCHIT ATA Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasin g decisions. The remains at the sole discretion of Oracle.

PAGE – 3 ============
ORACLE ENTERPRISE AR CHITECTURE WHITE PAP ER AN ENTERPRISE ARCHIT ATA Table of Contents Executive Summary 1 A Pointer to Additional Architecture Materials 3 Fundamental Concepts 4 What is Big Data? 4 The Big Questions about Big Data 5 t Big Data? 7 Taking an Enterprise Architecture Approach 11 Big Data Reference Architecture Overview 14 Traditional Information Architecture Capabilities 14 Add ing Big Data Capabilities 14 A Unified Reference Architecture 16 Enterprise Information Management Capabilities 17 Big Data Architecture Capabilities 18 Oracle Big Data Clo ud Services 23 24 Big Data SQL 24 Data Integration 26 Oracle Big Data Connectors 27 Oracle Big Data Preparation 28 Oracle Stream Explorer 29 Security Architecture 30 Comparing Business Intelligence , Information Discovery, and Analytics 31 Data Visualization 33 Spatial and Graph Analysis 35 Extending the Architecture to the Internet of Things 36 B ig Data Architecture Patterns in Three Use Cases 38 Use Case #1: Retail Web Log Analysis 38 Use Case #2: Financial Services Real – time Risk Detection 39 Use Case #3: Driver Insurability using Telematics 41 Big Data Best Practices 43 Final Thoughts 45

PAGE – 4 ============
1 | ORACLE ENTERPRISE AR CHITECTURE WHITE PAP ER AN ENTERPRISE ARCHIT ATA Executive Summary Today, Big Data is commonly defined as data that contains greater variety arr iving in increasing volumes and with ever higher velocity . Data growth, speed and complexity are being driven by deployment of billions of intelligent sensors and devices that are transmitting data (popularly called the Internet of Things) and by other sources of semi – structured and structured data. The data must be gathered on an ongoing basis, analyzed, and then provide direction to the business regarding appropriate actions to take, thus providing value. Most are keenly aware that Big Data is at the heart of nearly every digital transformation taking place today . For example, applications enabling better cu stomer experiences are often powered by smart devices and enable the ability to respond in the moment to customer actions . Smart products being s old can capture an entire environmental context. Business analysts and data scientists are developing a host of new analytical techniques and models to uncover the value provided by this data . Big Data solutions are helping to increase brand loyalty, mana ge personalized value chains, uncover truths, predict product and consumer trends, reveal pro duct reliability, and discover real accountability. IT organizations are eagerly deploying Big Data processing, storage and integration technologies in on premis es and Public Cloud – based solutions . Cloud – based Big Data solutions are hosted on Infrastructure as a Service (IaaS), delivered as Platform as a Service (PaaS), or as Big Data applications (and data services) via Software as a Service (SaaS) manifestations . Each must meet critical Service Level Agreements (SLAs) for the business intelligence, analytical and operational syste ms and processes that they are enabling. They must perform at scale, be resilient, secure and governable . They must also be cost effec tive, minimizing duplication and transfer of data where architecture footprints can now be delivered consistently to these standards. Oracle has created reference architectures for all of these deployment models. There is good reason for you to look to Oracle as the foundation fo r your Big Data capabilities. Since its inception, 35 years ago, Oracle has invested deeply across nearly every element of information management from software to hardware and to the innovative integration of b oth on premises and Cloud – continue to solve the toughest technological and business p roblems delivering the highest performance on the most reliable, available and scal able data platfor ms. Ora cle continues to deliver ancillary data management capabilities including data capture, transformation, movement, quality, security, and management while providing robust data discovery, access, analytics and visualization software . value is its long history of engineering the broadest stack of enterprise – class information technology to

PAGE – 5 ============
2 | ORACLE ENTERPRISE AR CHITECTURE WHITE PAPER AN ENTERPRISE ARCHIT ATA work together to simplify complex IT environments, reduce TCO, and to minimize the r isk when new areas emerge such as Big Data. Or acle thinks that Big Data is not an island. It is merely the latest aspect of an integrated enterprise – class information manage ment capability. Looked at on its own, Big Data can easily add to the complexity of a corporate IT environment as it evolve s th rough frequent open sourc e contributions, expanding Cloud – based offerings, and emerging analyti best – of – breed products, support, and servi ces can provide the solid foundation for your enterprise architectur e as you navigat e your way to a safe and successful future state. To deliver to business requirements and provide value , archite cts must evaluate how to efficiently manage the volume, variety, velocity of this new data across the entire enterprise information architecture. Big Da ta goals are not any different than the rest of your information management goals and technology are mature enough to process and analyze this data . This paper is an introduction to the Big Data ecosystem and the archit ecture choices that an enterprise architect w ill likely face. We define key terms and capabilities, present reference architecture s , and describe key Oracle products and open source solutions. We also provide some perspectives and principles and apply the se in real – world use cases. The approach and guidance offered is the byproduct of hundreds of customer projects and highlights the decisions that customers faced in the course of their architecture planning and implementations. across many industries and government agencies and have developed a standardized methodology based on enterprise architecture best practices. These should look familiar to architects familiar with TOGAF and other best architecture practices. prise architecture approach and framework are articulated in the Oracle Architecture Development Process (OADP) and the Oracle Enterprise Architecture Framework (OEAF).

PAGE – 6 ============
3 | ORACLE ENTERPRISE AR CHITECTURE WHITE PAPER AN ENTERPRISE ARCHIT ATA A Pointer to Additional Architecture Materials Oracle offers additional documents t hat are complementary to this white paper. A f ew of these are described below: IT Strategies from Oracle (ITSO) is a series of practitioner guides and reference architectures designed to enable organizations to develop an architecture – centric approach to enterprise – class IT initiatives. ITSO presents successful technology strategies and solution designs by defining universally adopted architecture concepts, principles, guidelines, standards, and patterns. The Big Data and Analytics Reference Architecture paper (39 pages) offers a logical architecture and Oracle product mapping. The Information Management R eference Architecture (200 pages) covers the information management aspects of the Oracle Reference Architecture and describes important concep ts, capabilities, principles, technologies, and several architecture views including conceptual, logical, product mapping, and deployment views that help frame the reference architecture. The security and management aspects of information management are c overed by the ORA Security paper (140 pages) and ORA Managem ent and Monitoring paper ( 72 pages) . Other related documents in this ITSO library include cloud computing, business analytics, business process management, or service – oriented architecture. The Information Management and Big Data Reference Architecture (30 pages) white paper offers a thorough overview for a vendor – neutral conceptual and logical architecture for Big Data. This paper will help you understand many of the planning issues that arise when architecting a Big Data capability. Examples of the business context for Big Data implementations for many companies and organizations appears in the industry whitepapers posted on the Oracle Enterprise Architecture web site . Industries covered include agribusiness, comm unications service providers, education, financial services, healthcare payers, healthcare providers, insurance, logistics and transportation, manufacturing, media and entertainment, pharmaceuticals and life sciences, retail, and utilities. Lastly, nu mer ous Big Data materials can be found on Oracle Technology Network (OTN) and Oracle.com/BigData .

PAGE – 8 ============
5 | ORACLE ENTERPRISE AR CHITECTURE WHITE PAPER AN ENTERPRISE ARCHIT ATA The Big Questions about Big Data The good news is that everyone has que stions about Big Data ! Both business and IT are taking risks and experimenting, a nd there is a healthy bias by all to learn. O is that as you take this journey, you should take an ent erprise architecture approach to information management ; that big data is an enterprise asset and needs to be managed from business alignment to governance as an integrated element of your current information management architecture. This is a practical approach since w e know that a s yo u transform from a proof of concept to run at scale, you will run into the same issues as other information management challenges, namely, skill s et requirements , governance, performance, scalability, management, integration, security, and access. The les son to learn is that you will go f u rther faster if you leverage prior investments and training. Here are some of the common questions that enterprise architects face: THE BIG DATA QUESTIO NS Areas Questions Possible Answers Business Context Business Intent How will we make use of the data? » Sell n ew products and services » Personalize customer experiences » Sense product maintenance needs » Predict risk , operational results » Sell value – added data Business Usage Which business processes can benefit ? » Operational ERP/CRM s ystems » BI and Reporting s ystems » Predictive analytics , m odeling , d ata m ining Data Ownership Do we need to own (and archive) the data? » Proprietary » Require historical data » Ensure lineage » Governance Architecture Vision Ingestion What are the sense and respond characteristics? » Sensor – based real – time events » Near real – time transaction events » Real – time analytics » Near real time analytics » No immediate analytics Data Storage What storage technologies are best for our data reservoir? » HDFS (Hadoop plus others) » File system » Data Warehouse » RDBMS » NoSQL database Data Processing What strategy is practical for my application? » Leave it at the point of capture » Add minor transformations » ETL data to analytical platform » Export data to desktops Performance How to maximize speed of ad hoc query , data transformations, and analytical modeling ? » Analyze and transform data in real – time » Optimize data structures for intended use » Use parallel processing » Increase h ardware and memory » Database configuration and operations » Dedicate hardware sandboxes » Analyze data at rest, in – place Latency How to minimize latency between key operational components? (ingest, reservoir, data warehouse, » Share storage » High speed interconnect

PAGE – 9 ============
6 | ORACLE ENTERPRISE AR CHITECTURE WHITE PAPER AN ENTERPRISE ARCHIT ATA Areas Questions Possible Answers reporting, sandboxes) » Shared private network » VPN – a cross public networks Analysis & Discovery Where do we need to do analysis? » At ingest real time evaluation » In a raw data reservoir » In a discovery lab » In a data warehouse/mart » In BI reporting tools » In the public cloud » On premises Security Where do we need to secure the data? » In memory » Networks » Data Reservoir » Data Warehouse » Access through tools and discovery lab Current State Unstructured Data Experience Is unstructured or sensor data being processed in some way today? (e.g. text, spatial, audio, video) » Departmental projects » Mobile devices » Machine diagnostics » Public cloud data capture » Various systems log files Consistency How standardized are data quality and governance practices? » Comprehensive » Limited Open Source Experience What experience do we have in open s ource Apache projects? (Hadoop, NoSQL, etc) » Scattered experiments » Proof of concepts » Production e xperience » Contributor Analytics Skills To what extent do we employ Data Scientists and Analysts familiar with advanced and predictive analytics tools and techniques ? » Yes » No Future State Best Practices What are the best re sources to guide decisions to build my future state? » Reference architecture » Development patterns » Operational processes » Governance structures and polices » Conferences and communities of interest » Vendor best practices Data Types How much transform ation is required for raw unstructured data in the data reservoir? » None » Derive a fundamental understanding with schema or key – value pairs » Enrich data Data Sources How frequently do sources or content structure change? » Frequently » Unpredictable » Never Data Quality When to apply transformations? » In the network » In the reservoir » In the data warehouse » By the user at point of use » At run time Discovery Provisioning How frequently to provision discovery lab sandboxes? » Seldom » Frequently

PAGE – 10 ============
7 | ORACLE ENTERPRISE AR CHITECTURE WHITE PAPER AN ENTERPRISE ARCHIT ATA Areas Questions Possible Answers Roadmap Proof of Concept What should the POC validate before we move forward? » Business use case » New technology understanding » Enterprise integration » Operational implications Open Source Skills How to acquire open s ource skills? » Cross – train employees » Hire expertise » Use experienced vendors/partners Analytics Skills How to acquire analytical skills? » Cross – train employees » Hire expertise » Use experienced vendors/partners Governance Cloud Data Sources How to guarantee trust from cloud data sources? » Manage directly » Audit » Assume Data Quality How to clean, enrich, dedup unstructured data? » Use statistical sampling » Normal techniques Data Qual i ty How frequently do we need to re – validate content structure? » Upon every receipt » Periodically » Manually or a utomatically Sec urity Policies How to extend enterprise data security policies? » Inherit enterprise policies » Copy enterprise policies » Only authorize specific tools/access points » Limited to monitoring security logs Big Data introduces new technology , processes, and skills t o your information architecture and the people that design, operate, and use them . With new technology, there is a tendency to separate the new from the old, but we strongly urge you to resist this strategy. While there are exceptions, the fundamental exp ectation is that f inding patterns in this new data enhances your ability to understand your existing data. Big Data is not a silo, nor should these new capabilities be architected in isolation . , but there are additional best – practices from enterprise – class information management strategies that will ensure Big Data success. Below are some important realizations about Big Data: Information Architecture Paradigm Shift Big data approaches data str ucture and analytics differently than traditional information architectures. A traditional data warehouse approach expects the data to undergo standardized ETL processes and eventually map into pre – defined schemas, also A crit icism of the traditional approach is the lengthy process to make changes to the pre – defined schema. One a spect of the appeal of Big Data is that the data can be captured without structure. Rather, t he structure will be derived either from the data itself or through other algorithmic process, also This approach is supported by new low – cost, in – memory parallel processing hardware/software architectures, such as HDFS/Hadoop and Spark.

PAGE – 11 ============
8 | ORACLE ENTERPRISE AR CHITECTURE WHITE PAPER AN ENTERPRISE ARCHIT ATA In addition, d ue to the large data volumes, Big Data also employs the t capabilities to extra cting, transforming and loading, thu s eliminating the high cost of moving data. Unifying Information Requires Governance Combining Big Data with traditional data add s additional context and provides the opportunity to deliver even greater insights. This is especially true in use cases where with key data entities, such as customers and products. In the example of consumer sentiment analysis , capturing a positive or negative social media comment has some value, but associating it with your most or least profitable cust omer makes it far more valuable. Hence, organizations have the governance responsibility to align disparate data types and certify data quality. Decision makers need to have confidence in the derivation of data regardless of its source, also known as data lineage. To desi gn in data quality you need to define common definitions and transformation rules by source and maintain through an active metadata store. The p owerful statistical and semantic tools can enable you to find the proverbial needle in the haystack, and can he lp you predict future events with relevant degrees of accuracy, but only if t he data is believable. Big Data Volume Keeps Growing Once committed to Big Data, it is a fact that the data volume will keep growing maybe even exponentially. In your throughput planning, beyond estimating the basics, such as storage for staging, data movement, transformations, and analytics processing, think about whether the new technologies can reduce latencies, such as parallel processing, machine learning, memory p rocessing, columnar indexing, and specialized algorithms. In addition, it is also useful to distinguish which data could be captured and analyzed in a cloud service versus on premises . Big Data Requires Tier 1 Production Guarantees One of the enabling con ditions for big data has been low cost hardware , processing, and storage. However, high volumes of low cost data on low cost hardware should not be misinterpreted as a signal for reduced service level agreement (SLA) expectations . Once mature, p roduction and analytic uses of Big Data carry the same SLA guarantees as other Tier 1 operational systems. In traditional analytical environments users report that, if their business analytics solution were out of service for up to one hour, it would have a mater ial negative impact on business operations. In transaction environments, the availability and resiliency commitment are essential for reliability. As the new Big Data components ( data sources, repositories, processing, integrations, network usage, and access ) become integrated into both standalone and combined analytical and operational processes, enterprise – class architecture planning is critical for success. While it is reasonable to experiment with new technologies and determine the fit of Big Data techniques, you will soon realize that running Big Data at scale requires the same SLA commitment, se curity policies, and governance as your other information systems. Big Data Resiliency Metrics Operational SLAs typically include t wo key related IT ma nagement metrics: Recovery Point Objective ( RPO ) and Recov ery Time Objective (RTO) . RPO is the agreement for acceptable data loss. RTO is the targeted recovery time for a disrupted business process. In a failure operations scenario, hardware and software must be recoverable to a point in time. While Hadoop an d NoSQL include notable high availability capabilities with mul ti – site failover and

49 KB – 49 Pages