You searched for Data Engineering - Netwoven https://netwoven.com/ Netwoven Inc. Wed, 17 Jul 2024 12:39:19 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.5 https://netwoven.com/wp-content/uploads/2023/07/cropped-favicon-32x32.png You searched for Data Engineering - Netwoven https://netwoven.com/ 32 32 Why Moving to a Modern Data Warehouse is Worth the Hype https://netwoven.com/data-engineering-and-analytics/why-moving-to-a-modern-data-warehouse-is-worth-the-hype/ https://netwoven.com/data-engineering-and-analytics/why-moving-to-a-modern-data-warehouse-is-worth-the-hype/#respond Fri, 05 Jul 2024 13:36:11 +0000 https://netwoven.com/?p=49223 Introduction Let us look at data from a different perspective today. Data is the lifeblood circulating through an organization’s systems, and over time, it becomes the DNA that shapes the… Continue reading Why Moving to a Modern Data Warehouse is Worth the Hype

The post Why Moving to a Modern Data Warehouse is Worth the Hype appeared first on Netwoven.

]]>
Introduction

Let us look at data from a different perspective today.

Data is the lifeblood circulating through an organization’s systems, and over time, it becomes the DNA that shapes the organization’s future strategies and predictions.

It has always been present, but now more than ever, businesses, institutions, and government bodies are beginning to recognize and harness its true potential, making strides in data analysis.

Hence, data warehousing is a critical element in modern business strategies.

In simple definition, a data warehouse is a place where huge amounts of data from multiple heterogeneous sources are kept for analysis and business intelligent activities. In the late 1980’s data warehouses were merely fragmented data systems. Then in 2011, Data Lakes came into being. Even if it consolidated the siloed data, the compute was slow. As we approached 2020, modern data warehouses addressed the challenges posed by Data Lakes by creating more flexible and unified environments. However, the issue of lack of integration remained. Finally, in 2023, it was possible to come up with a flexible, unified analytics platform integrated with AI.

Enter, Fabric Warehouse by Microsoft! In this blog, we will explore how Microsoft Fabric can substantially mitigate the challenges of traditional data warehousing while offering a scalable platform for future growth. We will discuss various migration strategies and highlight key features of the platform.

Challenges with traditional data warehouses

Challenges with traditional data warehouses
Cost

Establishing and maintaining a traditional data warehouse involves significant upfront costs.

Scalability

Scaling up or scaling out is challenging, demanding careful planning, significant time, and additional costs.

Architecture

The design and ETL processes necessitate a clear understanding of how the data will be used, lacking the flexibility to adapt to changing needs and goals.

Data Variety and Velocity

Traditional data warehouses have a tough time handling different types of data – structured, unstructured, and semi-structured. Plus, they struggle with streaming data processing. And when it comes to machine learning and AI, they often fall short due to their dependence on outdated, pre-transformed data.

Webinar: Migrate Traditional Data Warehouses to Fabric Modern Warehouse and Serverless SQL. Watch Now.

Factors driving modern data warehouse adoption

Factors driving modern data warehouse adoption

1) Data Acquisition

As businesses dive deeper into digitalization and embrace the vast interconnectivity of data from hybrid and multi-cloud setups, along with the staggering 63% growth in data volume, a unified data strategy becomes crucial. Imagine treating data like a service within specific domains, supported by a solid metadata framework. This proactive approach ensures top-notch data quality right from the start, even before setting up data pipelines.

2) Flexible and intelligent data engineering

Can it effortlessly connect with your existing setup, skipping the hassle of tinkering with pipelines, coding from scratch, or sticking to a single programming language? Streamlined integration means reaping rewards faster, boosting testing coverage without breaking the bank. Plus, it offers the freedom to tailor rules for vital pipelines right within your CI/CD workflow.

3) Data management and governance

Built on a security-first approach with compliance in mind.

Can it keep an eye on your data where it sits, without needing to move it? This not only ensures seamless scalability across your data platform but also saves costs. Plus, it guarantees that your organization stays compliant with top-notch security standards like GDPR, PCI DSS, SOC 2, open banking, and HIPAA.

4) Power BI, operational and data science in one place

Imagine having a single source of truth and unified data that powers your business intelligence reports and both batch and real-time ML/AI operations. Data engineers, scientists, and analysts all work with the same reliable data. No duplicates needed. Plus, you can create specific zones for microservices, all pulling from the same data in either batch or near-real-time.

5) Harnessing ML for automated insights

Picture this, your data warehouse platform uses clever machine learning to grasp your data environment automatically. It’s like having a built-in watchdog that alerts you the moment something goes awry. But here’s the catch, it’s not just about spotting anomalies. This system looks at the bigger picture, cutting down on false alarms and saving you from drowning in unnecessary rules and configurations. Say goodbye to wasting valuable engineering time and hello to seamless data monitoring.

Ebook: Migrating to Fabric ​Warehouse
Ebook: Migrating to Fabric ​Warehouse

In the eBook, discover how to tailor analytics models to your needs using best practices for Serverless SQL, optimizing data queries without managing infrastructure. Gain insights to avoid pitfalls, ensure a smooth transition, and expand the reach of transformative analytics applications.

Get the eBook

Conclusion

It’s a game-changer! With tools like Microsoft Fabric leading the charge, businesses are poised to break free from the constraints of traditional methods. Experience coherently integrating data, effortlessly scaling up, and having a watchful AI-powered eye on your operation. All without the headache of false alarms and tedious configurations. We are all ears, to listen to your specific data management concerns.  Please contact us, for more queries on your business requirements and if you are planning for a swift migration to modern data warehouse.

The future of data management is here, and it’s full of promise.

The post Why Moving to a Modern Data Warehouse is Worth the Hype appeared first on Netwoven.

]]>
https://netwoven.com/data-engineering-and-analytics/why-moving-to-a-modern-data-warehouse-is-worth-the-hype/feed/ 0
Migrate your SIEM to Microsoft Sentinel https://netwoven.com/services/cloud-infrastructure-and-security/sentinel-migration/ Wed, 03 Jul 2024 17:43:51 +0000 https://netwoven.com/?page_id=49202 Don’t Be In The News Due To A Cyber Attack Getting your security modernized and integrated is critical to ensuring a proper security posture and avoiding a cyber-attack that can… Continue reading Migrate your SIEM to Microsoft Sentinel

The post Migrate your SIEM to Microsoft Sentinel appeared first on Netwoven.

]]>
Don’t Be In The News Due To A Cyber Attack

Getting your security modernized and integrated is critical to ensuring a proper security posture and avoiding a cyber-attack that can lead to significant damage.

Security Information and Event Management (SIEM) plays a critical role in collecting data and providing insights to seek out and flag suspicious activities. However, traditional SIEM solutions lack the ability to detect attacks that span multiple security layers. They also struggle to correlate individual alerts into a full incident and determine the best way to protect and restore assets. Using many siloed security tools results in a slower time to respond, less visibility into attacks and needs more detailed engineering work to connect all the data, leading to increased burnout. Additionally, the rising cost of staff, licenses, complexity of engineering, maintenance, and the inability to cover your full estate with one tool make using an on-prem solution less appealing. 

Microsoft Sentinel is a modern, cloud-native SIEM powered by AI, automation, and Microsoft’s deep understanding of the threat landscape It empowers defenders to hunt and resolve critical threats at machine speed and at a lower total cost of ownership (TCO). It’s time to transition to Microsoft Sentinel.

🟊 YOU MAY BE ELIGIBLE FOR SOME MICROSOFT INCENTIVES TO HELP YOU TRANSITION.

Netwoven experts can help deploy Microsoft Sentinel and help migrate from your existing SIEM tool. We specialize in migration from many SIEM tools such as:

  • ➔ Splunk
  • ➔ QRadar
  • ➔ Logpoint
  • ➔ FireEye
  • ➔ Darktrace
  • ➔ Cisco SecureX
  • ➔ Symantec
  • ➔ Trend Micro
  • ➔ CrowdStrike
  • ➔ McAfee
  • ➔ Exabeam

Our proven process ensures on-time, on-budget and quality delivery.

Benefits of Migrating to AI-Powered Unified SecOps:

  • Unified AI Powered Platform: Leverage AI Powered Unified security operations platform with integrated SIEM.
  • Zero Trust Security: Adopt Zero Trust security strategy with fully integrated defense across identities, endpoints, network, apps, data and infrastructure
  • Modern SecOps: With in-built security orchestration, automation, and response (SOAR) capabilities, user and entity behavior analytics (UEBA) and threat intelligence (TI), customers get a complete SecOps solution that is both easy and powerful — at a fraction of the cost and hassle of standalone SIEM and SOAR solutions.
  • Real-Time Threat Detection: Leverage advanced threat intelligence to halt attacks promptly.
  • Actionable Insights: Utilize data-driven insights to enhance your security strategy

As a Microsoft partner with extensive experience providing cybersecurity solutions, Netwoven can help you deploy Microsoft Sentinel so you can fortify your security operations using advanced AI and comprehensive threat intelligence across your entire digital estate.

The post Migrate your SIEM to Microsoft Sentinel appeared first on Netwoven.

]]>
4 Ways to Prevent Insider Threats with Microsoft Purview https://netwoven.com/cloud-infrastructure-and-security/prevent-insider-threats-with-microsoft-purview/ https://netwoven.com/cloud-infrastructure-and-security/prevent-insider-threats-with-microsoft-purview/#respond Thu, 13 Jun 2024 13:34:37 +0000 https://netwoven.com/?p=49087 Introduction You might have heard of the book, “Eat that Frog”.  The phrase “Eat that Frog” means something that is difficult to face, but one must do it anyway. Experts… Continue reading 4 Ways to Prevent Insider Threats with Microsoft Purview

The post 4 Ways to Prevent Insider Threats with Microsoft Purview appeared first on Netwoven.

]]>
Introduction

You might have heard of the book, “Eat that Frog”. 

The phrase “Eat that Frog” means something that is difficult to face, but one must do it anyway. Experts suggest that we must start with the most difficult task. 

So, before we dive in, I have a few challenging questions for you. 

Has your organization ever encountered issues with certain employees? Or, to put it another way, do you suspect any employee might be harboring grudges against the organization? 

If your answer is yes, then what did you do to clear the air? 

If there are disputes in the past that were not handled efficiently, then your organization’s sensitive data might be at a huge risk of insider threat

On the other hand, if your answer is no, your organization may still face insider risks despite having a strong workplace culture. Data leaks or breaches can occur due to rookie mistakes or accidents.

Insider Threat Risk and Data Exfiltration Landscape

Insider threats and data exfiltration can arise from various factors, including financial gain, revenge, or ideological beliefs. They can also be unintentional, such as when an employee accidentally exposes sensitive data or breaches a security policy.

Crowd Research Partners reports that 90% of organizations feel vulnerable to insider attacks due to factors like excessive access privileges, an increased number of devices accessing sensitive data, and the growing complexity of IT systems. Additionally, 53% of organizations have confirmed insider attacks within the past year.

According to the Ponemon Institute’s 2022 Cost of Insider Threats: Global Report, the average cost of an insider incident is $11.4 million, and the average time to contain such an incident is 77 days (about 2 and a half months). 

These statistics underscore the importance of Insider Risk Management (IRM). Implementing effective IRM practices can provide organizations with significant advantages. 

  • Reduced risk of data breaches and other security incidents  
  • Improved data protection and privacy  
  • Lower costs associated with insider incidents  
  • Enhanced employee awareness and accountability 

It all boils down to organizations struggling with a fragmented solutions landscape. 80% of decision makers purchased multiple products to meet compliance and data protection needs. 

Microsoft Purview is a cloud-based solution that can help organizations effectively manage insider risk. Purview offers comprehensive tools for detecting, investigating, and responding to insider threats. It also aids in preventing these threats by providing visibility into user activity and enforcing security policies.

Ebook: 4 ways Microsoft Purview can help you identify and mitigate insider threats
Ebook: 4 ways Microsoft Purview can help you identify and mitigate insider threats

This eBook provides authoritative guidance on identifying potential insider threats, investigating insider incidents, remediating their impact, and preventing future occurrences.

Get the eBook

How Microsoft Purview can help you identify and mitigate insider threat risks

1. Identifying potential insider threats

Purview utilizes various signals to identify potential insider threats, including:

  • User activity: Purview monitors user activity across various sources, including Microsoft 365, Azure Active Directory, and endpoints. 
  • Data access: Purview tracks user access to sensitive data. 
  • Risk indicators: Purview uses various factors to identify the risk indicators such as changes in user behavior or access to unauthorized data. 
2. How do you respond to an insider threat?

When a potential insider threat is detected, Purview equips investigators with a comprehensive set of tools to thoroughly investigate the incident.

  • Activity Logs: Purview offers detailed logs that enable the reconstruction of user activity. 
  • User Profiles: Purview offers user profiles that include details on employment history, access permissions, and risk scores. 
  • Data Loss Prevention (DLP) Alerts: Purview generates alerts when sensitive data is accessed or exfiltrated. 
Ebook: 7 Steps to building a Compliance Based Organization with Microsoft Purview Solutions
Ebook: 7 Steps to building a Compliance Based Organization with Microsoft Purview Solutions

In this eBook, you’ll learn about the regulatory landscape and the importance of compliance, common compliance challenges, and how to understand, implement, and use Microsoft Purview for compliance effectively.

Get the eBook
3. How to solve an insider threat?

Once an insider incident has been investigated, Purview offers tools to remediate it. 

  • Remediation Tools: Purview equips security teams with the necessary tools to address insider incidents, enabling them to investigate, gather evidence, and take appropriate action. 
  • Continuous Monitoring: Purview consistently tracks user activity and data access to promptly detect and address insider incidents. This proactive approach aids in preventing insider threats from causing harm initially. 

Webinar: Protect your organization by staying compliant using Microsoft Purview. Watch Now.

4. How are insider threats prevented?

Purview additionally aids organizations in preventing insider incidents by offering insight into user activity and enforcing security policies. 

  • Raise awareness among employees regarding insider threats. 
  • Establish a robust identity and access management (IAM) program. 
  • Monitor user activity rigorously and enforce security policies effectively. 

You may also like : Data Security and Governance

Conclusion

To sum it up, Microsoft Purview Insider Risk Management is an all-in-one solution designed to aid organizations in identifying, assessing, and mitigating insider threats. Leveraging machine learning and artificial intelligence, the platform can detect various risky behaviors, such as data exfiltration, intellectual property theft, and account compromise. Additionally, it offers a suite of tools to facilitate the investigation and response to insider incidents.  

If this is enough to pique your interest, don’t forget to share your thoughts with us. We will be happy to clarify any of your doubts around Microsoft Purview insider threat management. 

The post 4 Ways to Prevent Insider Threats with Microsoft Purview appeared first on Netwoven.

]]>
https://netwoven.com/cloud-infrastructure-and-security/prevent-insider-threats-with-microsoft-purview/feed/ 0
Predicting Heart Disease Risk using ML Model with Microsoft Fabric https://netwoven.com/data-engineering-and-analytics/predicting-heart-disease-risk-using-ml-model-with-microsoft-fabric/ https://netwoven.com/data-engineering-and-analytics/predicting-heart-disease-risk-using-ml-model-with-microsoft-fabric/#respond Thu, 06 Jun 2024 04:56:59 +0000 https://netwoven.com/?p=49003 Introduction We’re constantly in a race against time, 100 mph. What are we trying to prove?   Ever wondered how much our heart is susceptible to daily work-related stress and… Continue reading Predicting Heart Disease Risk using ML Model with Microsoft Fabric

The post Predicting Heart Disease Risk using ML Model with Microsoft Fabric appeared first on Netwoven.

]]>
Introduction

We’re constantly in a race against time, 100 mph. What are we trying to prove?  

Ever wondered how much our heart is susceptible to daily work-related stress and in turn cardiovascular diseases? 

Here is the irony. It is a fatal heath risk globally, but we rarely pause to give it some serious consideration. Approximately 80% of cardiovascular disease (CVD) deaths result from strokes and heart attacks, and about 33% of these fatalities occur prematurely in individuals under the age of 70. 

In this blog, we highlight how to develop a Machine Learning (ML) model that can predict heart attack risk among existing patients using Microsoft Fabric

How to develop a Machine Learning Model that predicts the risk of heart attack

A healthcare provider used a database of historical patient details and created a dataset taking one or two factors like diabetes, hyperlipidemia, hypertension or already established disease. It includes 11 features that can be utilized to predict the likelihood of heart disease. 

We will show you a step-by-step approach to develop a Machine Learning model that can be used to predict high-risk heart attacks among their existing patients so that effective preventive treatment can be initiated. 

Ebook: Machine Learning with Microsoft Fabric
Ebook: Machine Learning with Microsoft Fabric

Netwoven, a leading Microsoft consulting firm, brings you this comprehensive guide to unlocking the transformative potential of Artificial Intelligence (AI) within your familiar Microsoft 365 environment.

Get the eBook

From Data to Decisions in Microsoft Fabric – A step-by-step Machine Learning Model 

Buckle up for a step-by-step journey through a straightforward data science project. 

Together, we’ll start by procuring data, then create ML models, and finally use these models to make inferences within Microsoft Fabric. 

Here we go! 

1. Select a Workspace

First things first. We start by creating a Microsoft Fabric workspace. This will serve as a location to store data and code while working on the prediction model. 

Microsoft Fabric Workspace is the ideal choice here because it provides highly scalable compute, enterprise-scale data storage and distribution capabilities.

To store our data, we’ve established a Lakehouse. A Lakehouse is a modern data architecture that combines the best features of data lakes and data warehouses, allowing for both unstructured and structured data storage and management. Since our data often comes in unstructured formats, such as CSV files, we upload these files to the Files section of the Lakehouse, which is specifically designed for unstructured data.

2. Code  

To work with data, we’ve chosen Python because it offers a wide range of enterprise-grade data science libraries. We’ll write and execute our code using a Notebook stored within the same workspace, which will be connected to the Lakehouse. 

Machine Learning Model - Code setup
3. Loading Data for Processing

We begin our data work by loading the CSV file from the Lakehouse into memory as a DataFrame within the Notebook.

Machine Learning Model - Loading Data for Processing
4. Data Exploration

The initial step in model construction involves understanding the data through exploration of the dataset’s stored values. Here, we commence by analyzing the data structure using DataFrame methods to identify column types, null values, duplicate records, and more.

Machine Learning Mode - Data Exploration

Furthermore, we employ statistical methods to analyze values across different columns or features, such as calculating means, counts, deviations, and the spread of values. 

Machine Learning Mode - stats
Machine Learning Mode - data object

Next, we explore visually by employing various types of charts. Initially, we examine a combined histogram that encompasses all columns.

Machine Learning Model - the column
Some observations
  1. Age looks well-rounded.
  2. Most patients are male.  
  3. Number of patients with 0 cholesterol seems to be an aberration
Machine Learning Model - the graph of patient

Plot value counts in different features/columns to understand the spread of data and aberrations 

In the second chart presented here, the type of heart pain is plotted against patients diagnosed with heart disease.

Machine Learning Model - the data
Machine Learning Model  - graph
Observations 
  1. As expected, most patients without heart diseases do not have TA heart pain.  
  2. Unexpectedly, most patients with heart disease have asymptotic heart pain. 

Although the male-to-female ratio in the sample leans towards males, the data for patients with heart disease is even more skewed, suggesting that male patients are more likely to have heart disease according to the data. 

Machine Learning Model  - data
Machine Learning Model - data details

In this 3rd example, the sex of the patient is plotted against heart disease.

5. Select Features to be Used for Building the Model 

To make accurate predictions, it’s crucial to select the most relevant features for predicting the target variable. To prevent overfitting, we must exclude features with minimal dependency. To achieve this, we’ll employ a correlation matrix to assess the correlation between features and targets. Strong correlations are indicated by values close to 1 or -1. 

Select Features to be Used for Building the Model
Observations

In this case, none of the features alone indicate the presence of heart disease. Hence, we need to use almost all except a few with very low correlation for prediction modelling.

Select Features to be Used for Building the Modeling
6. Prepare Data for Modeling 

Since we’ve identified the features, we want to utilize for modeling, we’ll proceed to transform the data to enhance our modeling and learning process. While the specific steps may differ for each dataset, the ones outlined here are essential. 

  • Separate target variable from features 
Prepare Data for Modeling - step 1
  • Convert categorical fields to numeric fields 
Prepare Data for Modeling - step 2
  • Divide data into training & testing datasets 
Prepare Data for Modeling - step 3
7. Normalize Training Dataset 

To ensure unbiased learning, models require a comparable number of instances for all target values. Achieving this involves either generating additional sample data or removing similar data values. As a result of this step, you’ll observe that the number of records is now equal for both target values. 

Normalize Training Dataset

Ebook: Fabric Copilot Generative AI and ML
Ebook: Fabric Copilot Generative AI and ML

This eBook gives all the information about Fabric Copilot and how simple it is to enable Copilot which is another generative AI. IT brings about new ways to transform and analyze data, generate insights, and create visualizations and reports in Microsoft Fabric.

Get the eBook
8. Create Data Science Experiment 

Initially, we’re uncertain which algorithm will be most effective. Considering the prediction requirements, we opt for a classification model. From this domain, we select three models: RandomForestClassifier, LogisticRegression, and XGBClassifier. We then conduct an experiment, training these models with the data to compare their predictive performance. 

Create Data Science Experiment
Create Data Science Experiment - part 2
9. Evaluate Model Performance 

Following that, we proceed to evaluate the models. To do so, we navigate to the experiment created in the previous step and compare their performance using the “View run list” option. 

Evaluate Model Performance 
10. Select Best Model

Now, we’re able to compare the performance of models using various parameters such as accuracy, F1 score, and more. This enables us to make a well-informed decision regarding which model to utilize. 

Select Best Model
11. Make ML Model Available to the Organization

We’re now prepared to share our ML model with the rest of the organization. To do this, we select the model and utilize the “Save” button to store this model in a dedicated workspace, which is then shared across the organization. As this experiment is conducted repeatedly with new data, new versions of the model can be saved within the same workspace, enabling data scientists to select appropriate versions and compare them with older ones.

Make ML Model Available to the Organization
Make ML Model Available to the Organization - the model
12. Inference 

To perform inference, follow these steps in your workspace: 

  • Open the ML Model. 
  • Select the desired version. 
  • Click “Apply this version.” 
 Inference
Inference details
To proceed, follow these steps in the wizard
  1. Select the delta table containing the data.
  2. Map the columns appropriately.
  3. Enter a table name to store the results.
  4. Specify the column name for the target values.
follow these steps in the wizard

You can either create a new Notebook with the code or copy it into an existing Notebook to perform inference or predictions.

Notebook to perform inference or predictions
13. Execute Notebook for Inference

When executed, the code downloads the model, performs predictions using the provided data, and stores the results in a new delta table; it can be customized to work with different data sources or output formats.

Execute Notebook for Inference
Execute Notebook for Inference - file details
14. Predicted Data

The data now includes an additional “IsHeartDisease” column, which contains the intended predictions. These predictions can be used as the final data product for reporting or further processing.

Predicted Data

Conclusion

In wrapping up, I hope this blog has shed some light on breaking down a Machine Learning model for predicting heart attack risks using Microsoft Fabric. The same approach can be applied to other types of predictive analysis. The key is to pick the right ML algorithm with the help of human experts, ask the right research questions, and track all relevant evaluation metrics. It’s also important to compare the model against conventional risk models. Once all these steps are taken care of, the ML model can be integrated with various healthcare devices to better monitor patient data and improve predictions. For more information, don’t hesitate to reach out. We’d love to hear your thoughts and explore new possibilities together in the realm of Machine Learning within Microsoft Fabric.

The post Predicting Heart Disease Risk using ML Model with Microsoft Fabric appeared first on Netwoven.

]]>
https://netwoven.com/data-engineering-and-analytics/predicting-heart-disease-risk-using-ml-model-with-microsoft-fabric/feed/ 0
Navigating the Migration to Microsoft Intune with Netwoven   https://netwoven.com/cloud-infrastructure-and-security/navigating-migration-to-microsoft-intune/ https://netwoven.com/cloud-infrastructure-and-security/navigating-migration-to-microsoft-intune/#respond Fri, 12 Apr 2024 10:19:42 +0000 https://netwoven.com/?p=48614 Introduction In the intricate process of migrating to Microsoft Intune, a successful communication and support strategy, coupled with a robust implementation plan, is crucial for facilitating user adoption. However, it’s… Continue reading Navigating the Migration to Microsoft Intune with Netwoven  

The post Navigating the Migration to Microsoft Intune with Netwoven   appeared first on Netwoven.

]]>
Introduction

In the intricate process of migrating to Microsoft Intune, a successful communication and support strategy, coupled with a robust implementation plan, is crucial for facilitating user adoption. However, it’s common in most migrations to encounter a subset of users resistant to change. Proactively devising strategies to minimize or eliminate resistance is a critical component of the planning phase. 

Addressing Migration Challenges

Several strategies can be employed to address migration challenges, each varying in applicability based on the company’s context: 

Establishing Cut-off Dates

We should set clear dates for each migration phase or overall, so that users know when the old service will stop. This could involve remote wiping of devices or restricting access to enterprise data. Clear communication about this process is vital.

Rolling Refresh of Older Devices

For devices where migration is complex or impossible, a strategy of replacing them with new devices pre-configured with the target Mobile Device Management (MDM) can be effective. 

Identifying Inactive Users

Active users typically migrate over time, but those who seldom use their devices might overlook migration communications. Identifying such users on the previous MDM and targeting them with specific communication or actions can be beneficial. 

A combination of these strategies, tailored to the company’s needs, often yields the best results. 

Coexistence Strategy During Transition

Migration to Intune doesn’t happen overnight. Managing both the old and new environments during the transition is crucial. This includes: 

Operations Management

Ensuring that teams like helpdesk and administrators can operate both systems efficiently with clear processes for handling incidents and requests.

New Device Enrollment

Deciding which platform to use for new enrollments during the transition, which could vary based on different organizational units or roles. 

Resource Access Management

Ensuring systems like Conditional Access, Wi-Fi networks, and VPN solutions can accommodate both Intune-managed and third party-managed devices. 

Each aspect requires careful consideration and planning to ensure a smooth transition and uninterrupted access to corporate resources.

In summary, the migration to Microsoft Intune is a multifaceted process requiring strategic planning, effective communication, and a thorough understanding of both technological and human factors. Addressing these elements proactively ensures a smoother transition and higher user adoption rates. Let’s now take a closer look at the different stages of migration and the approaches associated with each stage.

Preparing Current Microsoft Intune & Entra ID State

Before migrating, evaluate your existing Intune and Entra ID setups to understand their current utilization and configuration. 

Key Points to Review:
  1. Entra ID Accounts: Checking the setup and status. 
  2. Terms and Conditions: Reviewing terms set within the system. 
  3. Apple Push Certificate Configuration: Ensuring correct configuration. 
  4. Apple Business Manager and Apple VPP Integration: Verifying integration. 
  5. Managed Google Play Integration: Checking the integration status. 
  6. Company Portal Customizations: Reviewing any customizations. 
  7. Enrollment Restrictions: Assessing restrictions on device enrollments. 
  8. Policies: Examining all configuration, compliance, and conditional access policies. 
  9. Licensing: Ensuring proper licensing for Intune. 
  10. Conditional Access Policies: Reviewing the policies in place.

Migration Approach and Roadmap

The Phases of the Migration Process to Microsoft Intune

Migrating to a new management solution like Microsoft Intune is a significant undertaking for any organization. This technical guide delves into the structured process of migration, ensuring a smooth transition for device and application management. We’ll explore the key phases of the migration process, review the current state of Microsoft Intune & Entra ID, and outline the specific steps for migrating both Windows and mobile devices. 

Phase 1: Assess

The assessment phase lays the foundation for a successful migration. It involves a comprehensive review of your current deployment, focusing on various inventories and operational requirements.

Key Activities:
  1.  Device Inventory: Cataloging all devices, their types, models, and operating systems. 
  2. Application Inventory: Listing all applications, including custom and third-party apps. 
  3. Content Distribution: Reviewing the current distribution methods across the network. 
  4. Configuration Inventory: Documenting existing configurations and settings. 
  5. Users and Groups Review: Analyzing the setup of users and groups. 
  6. Report Inventory: Gathering existing reports and monitoring tools. 
  7. Integration Inventory: Identifying integrations with other systems and services. 
  8. Operations Review: Examining current operational processes. 
  9. Entra ID and Intune Tenant Review: Assessing the readiness of your Intune tenant and Entra ID. 
Phase 2: Design & Plan

Using the information from the assessment phase, you’ll begin planning your deployment, focusing on setting up your Intune tenant and developing a migration strategy.

Key Steps:
  1. Initial Tenant Configuration: Setting up your Intune tenant for deployment. 
  2. Scenario Design and Planning: Developing plans for each platform (iOS, Android, Windows, etc.). 
  3. Platform Enablement or Restriction: Deciding which platforms to enable or restrict. 
  4. Automated Enrollment Configuration: Configuring automated enrollment for each platform. 
  5. Configuration Engineering: Developing new configurations needed for Intune. 
  6. Migration Scenario Validation: Ensuring that your migration scenarios are feasible and effective. 
Phase 3: Test

Testing is essential to validate your migration plan, involving setting up test tenants to simulate the migration and validate various scenarios. 

Essential Testing Activities:
  1. Enrollment Validation: Testing the enrollment process for new devices. 
  2. Migration Validation: Ensuring smooth migration of existing devices. 
  3. Scenario Validation: Testing each planned scenario. 
  4. Operations Validation: Validating operational procedures and processes. 
  5. Device Decommission Validation: Testing the process for decommissioning old devices.
Phase 4: Deploy

The deployment phase is where the migration plan is executed in the production environment, typically done in stages to minimize disruptions.

Deployment Steps:
  1. Initial Production Validation Groups: Starting with small groups to validate the deployment. 
  2. Phased Expansion: Gradually increasing the size of the deployment groups. 

Migration Steps for Windows to Microsoft Intune

Migrating Windows devices involves several critical steps, each with its own considerations.

Key Steps:
  1. Wiping the Device: Resetting the device to its factory state. 
  2. Unenrolling from Existing Management Platform: Ensuring data preservation while changing management platforms. 
  3. Pre-Migration Preparations: Including assumptions, functioning PBR, and in-place upgrades. 
  4. Migration Considerations: Covering user communications, power management policy changes, and blocking enrollment in legacy environments. 
  5. Technical Preparations: Deploying persistent provisioning packages and harvesting Autopilot hardware hashes. 
  6. Post-Reset Actions: Cleaning up legacy device objects and importing Autopilot hardware hashes. 
  7. Enrollment and Post-Migration: Enrolling devices using Autopilot and conducting mop-up activities.

Migration Steps for Mobile Devices to Microsoft Intune

Migrating mobile devices also involves distinct scenarios.

Key Scenarios:
  1.   Deploy Company Portal App: Use your current management solution to push the Company Portal application to devices. Ensure it remains installed after unenrollment. 
  2. Enable User-Initiated Unenrollment: Allow users to unenroll their devices from the current management system. 
  3. Implement Conditional Access Policy (Optional but Recommended): Set up a policy requiring device compliance for accessing corporate resources. Carefully select the user groups for this policy. 
  4. Manual Device Enrollment by Users: Instruct users to enroll their devices in Intune using the Company Portal, like new or unmanaged devices. 
  5. Provide Setup Guidance: Offer instructions to users for reconfiguring their applications, such as email clients and collaboration tools, post-enrollment. 

Conclusion

In the rapidly evolving landscape of device and application management, enterprises seeking to transition from their existing Mobile Device Management (MDM) systems to Microsoft Intune can find a robust and reliable partner in Netwoven. Netwoven plays a pivotal role in guiding enterprises through the complexities of migration, ensuring a seamless and efficient transition to Microsoft Intune.

Expertise and Customized Strategies

Netwoven brings to the table deep expertise in Microsoft technologies, offering tailored strategies that align with the unique needs of each enterprise. Our approach is not one-size-fits-all. Instead, we meticulously assess the current MDM environment of an enterprise and devise a migration plan that minimizes disruptions and maximizes efficiency.

Comprehensive Assessment and Planning

Netwoven begins the migration journey with a thorough assessment phase, where they catalog and review the existing device and application infrastructure. This detailed analysis forms the foundation for a well-structured migration plan, ensuring that every aspect of the current system is considered and appropriately transitioned to Intune.

Streamlined Migration and Testing

With a clear plan in place, Netwoven expertly navigates enterprises through the migration process. They ensure that each step, from initial tenant configuration to scenario validation, is executed with precision. Our rigorous testing phase is particularly crucial, as it validates the migration strategy and ensures that all systems function optimally in the new Intune environment. Feel free to contact us for an Endpoint Management Workshop.  

The post Navigating the Migration to Microsoft Intune with Netwoven   appeared first on Netwoven.

]]>
https://netwoven.com/cloud-infrastructure-and-security/navigating-migration-to-microsoft-intune/feed/ 0
Microsoft Fabric Internals Deep Dive https://netwoven.com/data-engineering-and-analytics/microsoft-fabric-internals-deep-dive/ https://netwoven.com/data-engineering-and-analytics/microsoft-fabric-internals-deep-dive/#respond Fri, 23 Feb 2024 14:08:36 +0000 https://netwoven.com/?p=48344 Introduction Microsoft Fabric offers a cohesive Software as a Service (SaaS) solution, encompassing the essential functionalities for analytics across Power BI, Microsoft Azure Data Factory, and the upcoming iteration of… Continue reading Microsoft Fabric Internals Deep Dive

The post Microsoft Fabric Internals Deep Dive appeared first on Netwoven.

]]>
Introduction

Microsoft Fabric offers a cohesive Software as a Service (SaaS) solution, encompassing the essential functionalities for analytics across Power BI, Microsoft Azure Data Factory, and the upcoming iteration of Synapse. Fabric consolidates data warehousing, data engineering, data science, data integration, applied observability (through the Data Activator experience), real-time analytics, applied observability (through the Data Activator experience), and business intelligence within a unified architecture.

We try to give a clear and concise explanation to our customers on Microsoft Fabric internals so that this explanation gives customers a direction on how it differs from other data warehouses. How important role It play in enterprise digitalization.

Key Differentiators and Importance

Unified Experience

Unlike traditional siloed solutions, Fabric provides a seamless end-to-end experience for data management, analytics, and observability.

Efficiency

By consolidating services, Fabric streamlines workflows, reduces complexity, and accelerates time-to-insights.

Scalability

Fabric scales effortlessly to handle enterprise-scale data volumes and diverse workloads.

Strategic Impact

As organizations embrace digital transformation, Fabric becomes a strategic enabler for data-driven decision-making, innovation, and growth.

Microsoft Fabric isn’t just another data warehouse—it’s a holistic ecosystem that empowers enterprises to harness their data effectively and drive meaningful outcomes. 

Microsoft Fabric Services
Microsoft Fabric Services

Let’s us deep dive into the fascinating world of Unified Data Management

1. Lakehouse Delta Lake Format with V-Order Compression and Versioning
  • The concept of a lakehouse combines the best data lakes and data warehouses. It allows organizations to store vast amounts of raw data while also providing a warehouse’s structure and query capabilities.
  • Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads. It ensures data consistency, reliability, and performance.
  • V-Order compression is a technique that compresses data using variable-length codes, optimizing storage efficiency.
2. Bin Compaction for Improved Performance
  • Bin compaction optimizes storage by grouping data into bins or segments. It reduces fragmentation and enhances query performance.
3. Virtual Warehouses and Serverless Pools
  • Virtual warehouses are scalable, on-demand computing resources for running queries against data stored in cloud data warehouses.
  • Serverless pools provide automatic scaling based on workload demand, allowing efficient resource utilization without manual provisioning.
4. Integrated Services
  • A unified approach integrates various data services, such as data cataloging, lineage tracking, and governance, into a cohesive platform.
5. BI Reporting and Data Science in One Platform
  • Having business intelligence (BI) reporting and data science tools within the same platform streamlines analytics workflows and promotes collaboration.
6. OneNote Book for Data Pipelines and Data Science
  • OneNote provides a collaborative environment for documenting data pipelines, experiments, and insights.
7. Zero Table Cloning
  • Eliminating table cloning reduces redundancy and simplifies data management.
8. Data Engineering Services
  • These services encompass tasks related to data ingestion, transformation, and preparation.
9. Shortcuts with APIs for Custom Code
  • Developers can create custom shortcuts using APIs, enhancing productivity and flexibility.
10. OneSecurity and Data Governance
  • Ensuring data security and governance across the entire data lifecycle is critical for compliance and risk management.
Migrating to Fabric ​Warehouse
Migrating to Fabric ​Warehouse

In this eBook, we will outline how Microsoft Fabric can significantly reduce the issues facing traditional data warehousing and provide a scalable platform for future growth.

Get the eBook

Microsoft Fabric Data Warehouse is an intriguing cloud-native data warehousing solution that harnesses the power of the Polaris distributed SQL query engine.

Let’s get  into the details

1. Polaris Engine
  • Stateless and Interactive: Polaris stands as a stateless, interactive relational query engine that drives the Fabric Data Warehouse. It’s designed to seamlessly unify data warehousing and big data workloads while segregating compute and state.
  • Optimized for Analytics: Polaris is a distributed analytics system, meticulously optimized for analytical workloads. It operates as a columnar, in-memory engine, ensuring high efficiency and robust concurrency handling.
  • Cell Abstraction: Polaris represents data using a unique “cell” abstraction with two dimensions:
    • Distributions: Aligns data efficiently.
    • Partitions: Enables data pruning.
  • Cell Awareness: Polaris elevates the optimizer framework in SQL Server by introducing cell awareness. Each cell holds its own statistics, vital for the Query Optimizer (QO). This empowers the QO to implement diverse execution strategies and sophisticated estimation techniques, unlocking its full potential.
2. Fabric Data Warehouse Features
  • Delta Lake Format: Fabric Warehouse persists data in Delta Lake format, ensuring reliability and transactional consistency.
  • Separation of State and Compute: By decoupling state and compute, Fabric Warehouse achieves enhanced resource scalability and flexible scaling.
  • Fine-Grained Orchestration: Task inputs are defined in terms of cells, allowing for fine-grained orchestration using state machines.
  • Cloud-Native and Scalable: Polaris, being cloud-native, supports both big data and relational warehouse workloads. Its stateless architecture provides the flexibility and scalability needed for modern data platforms.
key points Fabric lakehouse - Delta Lake

Webinar: Data Science for Business with Microsoft Fabric. Watch Now.

Let’s break down the components of the above architecture diagram

1. Delta Lake
  • Delta Lake is an optimized storage layer that serves as the cornerstone for storing data and tables within the Databricks lakehouse architecture. It extends Parquet data files by adding a file-based transaction log for ACID transactions and scalable metadata handling.
  • It ensures data reliability, consistency, and quality within your lakehouse architecture.
2. V-Order
  • V-Order is a write-time optimization applied to the Parquet file format. It significantly enhances read performance under Microsoft Fabric compute engines (such as SQL, Spark, Power BI etc.).
  • By sorting, distributing row groups, using dictionary encoding, and applying compression, V-Order reduces network, disk, and CPU resources during reads, resulting in cost efficiency and improved performance.
  • It has a 15% impact on average write times but provides up to 50% more compression. Importantly, Delta Lake stays fully compliant with the open-source Parquet format, ensuring compatibility.
3. Unified Data Lakehouse
  • You’re aiming for a unified architecture that combines the best of both data lakes and data warehouses.
  • Here’s how you can structure it
    • Bronze Zone: Raw, unprocessed data lands here.
    • Silver Zone: Data is cleaned, transformed, and enriched.
    • Gold Zone: Aggregated, curated data for business intelligence (BI) and machine learning (ML)/AI purposes.
    • Data Warehouse: The gold zone serves as your data warehouse, providing a trusted source for BI queries.
    • Fabric Copilot: Fabric Copilot ensures data quality and truth across all zones.
4. Integration

In summary, your architecture combines Delta Lake, V-Order, and a unified data lakehouse to achieve trusted data quality for ML/AI, BI, and analytics.

The Fabric, a unified SaaS experience, integrates Data Observability within its architecture.

Here are the key points

  • Unified Platform: Fabric combines capabilities for analytics across Microsoft Azure Data FactoryPower BI, and the next-gen Synapse.
  • Comprehensive Offerings: Fabric provides Data GovernanceData SecurityData IntegrationData EngineeringData WarehousingData ScienceReal-time AnalyticsApplied Observability (via the Data Activator), and Business Intelligence.

The Data Activator experience within the Fabric ecosystem is a novel module designed for real-time data detection and monitoring.

Let’s explore its key features

1. Real-Time Data Detection
  • The Data Activator continuously scans incoming data streams, identifying patterns, anomalies, and events in real time.
  • It leverages machine learning algorithms and statistical techniques to detect changes, spikes, or deviations from expected behavior.
  • Whether it’s sudden spikes in website traffic, unexpected sensor readings, or unusual transaction patterns, the Data Activator raises alerts promptly.
2. Monitoring and Alerting
  • Once detected, the Data Activator triggers alerts or notifications to relevant stakeholders.
  • These alerts can be customized based on severity levels, thresholds, and specific conditions.
  • Monitoring dashboards provide real-time visibility into data health, allowing data engineers and analysts to take immediate action.
3. Adaptive Learning
  • The Data Activator learns from historical data and adapts its detection algorithms over time.
  • As new data arrives, it refines its models, ensuring accurate and relevant alerts.
  • Adaptive learning helps reduce false positives and enhances the system’s responsiveness.
4. Integration with Fabric Components
  • The Data Activator seamlessly integrates with other Fabric components, such as data pipelines, data lakes, and analytics workflows.
  • It complements existing observability features, enhancing the overall data management experience.
  • By providing real-time insights, it empowers organizations to proactively address data quality, compliance, and operational challenges.
Microsoft Fabric Analytics

Conclusion

A unified data management approach combines data quality, observability, cataloging, governance, and lineage. It centralizes and automates data workflows, enabling organizations to harness the full potential of their data and analytics investments, and plays a key role in enterprise digitalization with AI

Microsoft Fabric Data Warehouse, powered by the Polaris engine, seamlessly bridges the gap between data warehousing and big data, all while embracing cloud-native principles.

the Data Activator experience is a crucial part of Fabric’s commitment to data observability, ensuring that data anomalies and issues are swiftly detected and addressed.

The post Microsoft Fabric Internals Deep Dive appeared first on Netwoven.

]]>
https://netwoven.com/data-engineering-and-analytics/microsoft-fabric-internals-deep-dive/feed/ 0
Microsoft Fabric Warehouse Deep Dive into Polaris Analytic Engine https://netwoven.com/data-engineering-and-analytics/microsoft-fabric-warehouse-deep-dive-polaris-analytic/ https://netwoven.com/data-engineering-and-analytics/microsoft-fabric-warehouse-deep-dive-polaris-analytic/#respond Tue, 06 Feb 2024 13:58:19 +0000 https://netwoven.com/?p=48211 Introduction Microsoft Fabric Data Warehouse is a cloud-native data warehousing solution that leverages the Polaris distributed SQL query engine. Polaris is a stateless, interactive relational query engine that powers the… Continue reading Microsoft Fabric Warehouse Deep Dive into Polaris Analytic Engine

The post Microsoft Fabric Warehouse Deep Dive into Polaris Analytic Engine appeared first on Netwoven.

]]>
Introduction

Microsoft Fabric Data Warehouse is a cloud-native data warehousing solution that leverages the Polaris distributed SQL query engine.

Polaris is a stateless, interactive relational query engine that powers the Fabric Data Warehouse.

  1. It is designed to unify data warehousing and big data workloads while segregating compute and state for seamless cloud-native operations Polaris is a distributed analytics system that is optimized for analytical workloads  
  2. It is built from the ground up to serve the needs of today’s data platforms  
  3. It is a columnar, in-memory engine that is highly efficient and handles concurrency well.  I hope this helps you understand the inner workings of Microsoft Fabric Data Warehouse and Polaris Engine.  

I hope this helps you understand the inner workings of Microsoft Fabric Data Warehouse and Polaris Engine.  

MS Fabric: the data platform

Fabric Warehouse – Polaris Analytics Engine

The decoupling of compute and storage in Synapse Dedicated SQL Pool is a significant advantage over Microsoft Fabric Data Warehouse, as it allows for enhanced resource scalability and flexible resource scaling. 

In stateful architectures, the state for inflight transactions remains stored in the compute node until the transaction commits, rather than being immediately hardened into persistent storage. As a consequence, in the event of a compute node failure, the state of non-committed transactions becomes lost, leaving no recourse but to terminate in-flight transactions. In summary, stateful architectures inherently lack the capability for resilience to compute node failure and elastic assignment of data to compute resources. 

However, decoupling of compute and storage is not the same as decoupling compute and state. In stateless compute architectures, compute nodes are designed to be devoid of any state information, meaning all data, transactional logs, and metadata must reside externally. This approach enables the application to partially restart query execution in case of compute node failures and smoothly adapt to real-time changes in cluster topology without disrupting in-flight transactions. 

MS Fabric - evolution of data warehouse

The evolution of data warehouse architectures over the years.

Data abstraction

Polaris represents data using a “cell” abstraction with two dimensions:
  • Distributions (data alignment
  • Partitions (data prunining)
MS Fabric - Data abstraction

Polaris significantly elevates the optimizer framework in SQL Server by introducing cell awareness, where each cell holds its own statistics, vital for the Query Optimizer (QO). The QO, benefiting from Polaris’ cell awareness, implements a wide array of execution strategies and sophisticated estimation techniques, unlocking its full potential. In Polaris, a dataset is represented as a logical collection of cells, offering the flexibility to distribute them across compute nodes to achieve seamless parallelism. 

To achieve effective distribution across compute nodes, Polaris employs distributions that map cells to compute nodes and hash datasets across numerous buckets. This intelligent distribution enables the deployment of cells across multiple compute nodes, making computationally intensive operations like joins and vector aggregation attainable at the cell level, sans data movement, provided that the join or grouping keys align with the hash-distribution key. 

Furthermore, partitions play a crucial role in data pruning, selectively optimizing data for range or equality predicates defined over partition keys. This optimization is employed only when relevant to the query, ensuring efficiency. 

A remarkable feature is the physical grouping of cells in storage as long as they can be efficiently accessed (diagonal green and blue stripes cells in the image above), allowing queries to selectively reference entire cell dimensions or even individual cells based on predicates and operation types present in the query, granting unparalleled flexibility and performance. 

The Polaris distributed query processing (DQP) operates precisely at the cell level, regardless of what is within each cell. The data extraction from a cell is seamlessly handled by the single-node query execution (QE) engine, primarily driven by SQL Server, and is extensible for accommodating new data types with ease. 

Flexible assignment of cells to compute

The Polaris engine is resilient to compute failures because of the flexible cells allocation to compute nodes. When a node failure or topology change occurs (scale up or down), it’s possible to efficiently re-assign the cells of the lost node to the remaining topology. To achieve this flexibility, the system maintains a metadata state, which includes the assignment of cells to compute nodes at any given time, in a durable manner outside the compute nodes. This means that the critical information about the cell-to-compute node mapping is stored in a reliable and persistent external storage, ensuring its availability even in the face of node failures. 

This design enhances the overall resilience and by adopting this approach, the Polaris engine can quickly recover from node failures or topology changes, dynamically redistributing cells to healthy compute nodes and ensuring uninterrupted query processing across the entire system.

MS Fabric Query

From queries to task DAGs

The Polaris engine follows a two-phased approach for query processing:
1. Compilation using SQL Server Query Optimizer:

In the first phase, the Query Optimizer takes the query and generates all possible logical plans. A logical plan represents different ways the query can be executed without considering the physical implementation details.

 2. Distributed Cost-Optimization:

In the second phase, it enumerates all the physical implementations corresponding to the previously generated logical plans. Each physical implementation represents a specific execution strategy, considering the actual resources available across the distributed system. The goal of this cost-optimization phase is to identify and select the most cost-efficient physical implementation of the logical plan. It then picks one with the least estimated cost and the outcome is a good distributed query plan that takes data movement cost into account. 

A Task is a physical execution of an operator defined in the two-phased optimization. Each physical execution of an operator, as defined in the two-phased optimization, is seen as a directed acyclic graph (DAG)

A task has three components:  
  1. Inputs – Collections of cells for each input’s data partition. 
  2. Task template - Code to execute on the compute nodes 
  3. Output - dataset represented as a collection of cells produced by the task. It can be either an intermediate result or the final result to return to the user. 

Basically, at run time, a query is transformed into a query task DAG, which consists of a set of tasks with precedence constraints. 

Task Orchestration

A new design in Polaris is a novel hierarchical composition of finite state machines. The state machine lies in its hierarchical state machine composition, which captures the execution intent. Polaris takes a different approach from conventional Directed Acyclic Graph (DAG) execution frameworks by providing a state machine template that orchestrates the execution. 

By using it, Polaris gains a significant advantage in terms of formalizing failure recovery mechanisms. The state machine recorder, which operates as a log, enables the system to observe and replay the execution history. This capability proves invaluable in recovering from failures, as it allows the system to precisely recreate the execution sequence and take corrective actions as needed. 

A query has 3 aspects, the query DAG, the task templates, and tasks, and it is called an entity. The execution state of each entity is monitored through an associated state machine, encompassing a finite set of states and state transitions. Each entity’s state is a result of composing the states of the entities from which it is built. By utilizing state machines to track and manage the entities’ states, Polaris gains greater control over its overall execution, promoting better coordination, and facilitating the implementation of necessary actions based on the current state. 

States can be:
  1. Simple - used to denote success, failure, or readiness of a task template 
  2. Composite - It denotes an instantiated task template or a blocked task template

A composite state differs from a simple state in that its transition to another state is defined by the result of the execution of its dependencies.

MS Fabric- Task Orchestration

In summary, the hierarchical state machine composition in Polaris ensures a structured representation of execution intent, providing better control over query execution, recovery from failures, and the ability to analyze and replay execution history. 

Migrating to Fabric ​Warehouse

In this eBook, we will outline how Microsoft Fabric can significantly reduce the issues facing traditional data warehousing and provide a scalable platform for future growth.

Get the eBook

Service Architecture

MS Fabric - Service Architecture

Polaris Architecture

The Polaris architecture and all services within the pool are stateless. Data is stored remotely and is abstracted via a data cell. Metadata and transaction log state are off-loaded to centralized services. It means that two or more pools will share metadata and transaction log state. Placing the state in centralized services coupled with a stateless micro-service architecture within a pool means multiple compute pools can transactionally access the same logical database. 

The Polaris architecture incorporates a stateless design. Data storage is maintained remotely and takes the form of an abstract data cell. The management of metadata and transaction log states is delegated to centralized services, facilitating shared state utilization among two or more pools. This strategy empowers multiple compute pools to achieve transactional access to a shared logical database. 

The SQL Server Front End (SQL-FE) is the service responsible for compilation, authorization, authentication, and metadata. 

The Distributed Query Processor (DQP) is responsible for distributed query optimization, distributed query execution, query execution topology management, and workload management (WLM). 

Finally, a Polaris pool consists of a set of compute servers each with a dedicated set of resources (disk, CPU, and memory). Each compute server runs two micro-services: 

  • Execution Service (ES) - that is responsible for tracking the life span of tasks assigned to a compute container by the DQP 
  • SQL Server instance - that is used as the backbone for the execution of the template query for a given task and holding a cache on top of local SSDs 

The data channel serves a dual purpose: it facilitates the transfer of data between compute servers and also acts as the pipe through which compute servers transmit results to the SQL Frontend (FE). 

Tracking the complete journey of a query is the control flow channels responsibility and tracks the progression of the query from the SQL FE to the DQP and subsequently from the DQP to the Execution Server. 

Migrate Traditional Data Warehouses to Fabric Modern Warehouse. Watch Now.

Auto-Scale

As demand fluctuates, the Polaris engine requests additional computational resources, effectively requesting more containers from the underlying Fabric capacity. This adaptive approach ensures seamless accommodation of workload peaks. Behind the scenes, the engine adeptly redistributes tasks to newly added containers, all the while maintaining the continuity of ongoing tasks. Scaling down is transparent and automatic when the workload drops utilization.

Resilience to Node Failures

The Polaris engine is resilient by autonomously recovering from node failures and intelligently redistributing tasks to healthy nodes. This functionality is seamlessly integrated into the hierarchical state machine, as discussed earlier. This mechanism plays a critical role in enabling effective scalability for large queries since the probability of node failure increases with the number of nodes involved.

Hot spot recovery

The Polaris engine manages challenges like hot spots and skewed computations through the integration of a feedback loop between the DQP and the Execution Service. This mechanism monitors the lifecycle of execution tasks hosted on nodes. Upon detecting an overloaded compute node, it automatically redistributes a subset of tasks to a less burdened compute node, If this doesn’t mitigate the issue, the Polaris engine seamlessly falls back to its auto-scale feature, which enables the addition of computational resources to effectively mitigate the issue. 

Conclusion:

Separation of state and compute. Flexible abstraction of datasets as cells. Task inputs are defined in terms of cells. Fine-grained orchestration of tasks using state machines will give more flexibility and scalability.

Delta Optimized V-order – write time optimizations to parquet file format V-Order works by applying special sorting, row group distribution, dictionary encoding and compression on parquet files will bring all data into open file format which will perform all bin compaction hence no need of writing manual code for cleanup of data. Polaris is cloud-native which now supports both big data and relational warehouse workloads and the stateless architecture provides flexibility and scalability.

Though some of the functions from the dedicated SQL pool (DW) are missing, we feel Fabric Data Warehouse is promising. We are working on a benchmark comparison in our next blogs.

Disclaimer: – Some of the content presented in this blog is from the original Reasearch paper from PVLDB Reference Format: Josep Aguilar-Saborit, Raghu Ramakrishnan et al. VLDB Conferences.   Microsoft Corp. We have added our comments and views.

The post Microsoft Fabric Warehouse Deep Dive into Polaris Analytic Engine appeared first on Netwoven.

]]>
https://netwoven.com/data-engineering-and-analytics/microsoft-fabric-warehouse-deep-dive-polaris-analytic/feed/ 0
11 Best Practices for Securing Data in the Cloud https://netwoven.com/cloud-infrastructure-and-security/best-practices-for-securing-data/ https://netwoven.com/cloud-infrastructure-and-security/best-practices-for-securing-data/#respond Thu, 23 Nov 2023 17:45:55 +0000 https://netwoven.com/?p=47948 Introduction In today’s digital age, businesses are increasingly relying on cloud services to store, manage, and process their data. While the cloud offers unparalleled flexibility and scalability, it also introduces… Continue reading 11 Best Practices for Securing Data in the Cloud

The post 11 Best Practices for Securing Data in the Cloud appeared first on Netwoven.

]]>
Introduction

In today’s digital age, businesses are increasingly relying on cloud services to store, manage, and process their data. While the cloud offers unparalleled flexibility and scalability, it also introduces new security challenges as it poses some security risks that need to be addressed. Protecting sensitive information is paramount as data is becoming the lifeline of businesses, and businesses must adopt robust practices to ensure the security of their data in the cloud. In this blog post, we will explore these best practices for securing data in cloud services.

1. Choose a Reputable Cloud Service Provider

Cloud Service Provider

Selecting a reputable cloud service provider is the foundation of a secure cloud strategy. Research providers thoroughly, considering factors such as their security certifications, compliance measures, and reputation within the industry. Established providers often invest heavily in security infrastructure, offering a more secure environment for your data.

2. Understand Shared Security Responsibilities

It is important to understand that the cloud provider is not solely responsible for securing your data. Depending on the type of cloud service model (such as Software as a Service, Platform as a Service, or Infrastructure as a Service), the security responsibilities of the cloud service provider and the customer may vary. It is important to understand that the cloud provider is not solely responsible for securing your data. 

For example, in a SaaS model, the provider is responsible for most of the security aspects, such as the application, the data, the network, and the infrastructure. In an IaaS model, the customer is responsible for most of the security aspects, such as the data, the operating system, the network, and the firewall. Businesses should understand their security responsibilities and obligations and ensure that they are met. 

3. Implement Encryption (Data at Rest, in Transit, and During Processing)

Implement Encryption

Implement robust encryption practices to protect your data at all stages. Encrypt sensitive information at rest in the cloud storage, during transit between your system and the cloud, and even during processing within the cloud environment. This ensures that even if unauthorized access occurs, the data remains unreadable without the proper decryption keys. Also, the keys should be stored in a safe location. It is also recommended to implement the Data Loss Prevention

4. Regularly Update and Patch Systems

Software updates are important for fixing bugs, improving performance, and enhancing security. Businesses should check and update their software for vulnerabilities on regular basic and promptly apply patches, including their operating systems, applications, browsers, and antivirus programs, to reduce the risk of exploitation by malicious actors or cybercriminals. They should also enable automatic updates whenever possible and avoid using outdated or unsupported software. 

5. Use strong authentication

strong authentication

Implement strong authentication and access controls to restrict access to your cloud data and resources. This includes using multi-factor authentication (MFA) for all users and enforcing least privilege access. MFA adds an extra layer of security by requiring users to verify their identity through multiple methods, such as a password and a mobile authentication code before accessing sensitive data. This significantly reduces the risk of unauthorized access. 

6. Regular Monitoring and Auditing

Continuous monitoring and auditing of user activities in the cloud environment can help detect suspicious behavior and potential security incidents. This includes reviewing configurations, access logs, and user activities. Automated tools can assist in identifying and mitigating potential security risks, helping you stay one step ahead of potential threats. 

Parallelly, regular security audits and deployment of robust monitoring and logging solutions helps detect potential vulnerabilities, weaknesses, and unusual activities in your cloud infrastructure. Set up alerts for suspicious behaviour, unauthorized access attempts, or changes in data access patterns and establish response protocols to address incidents promptly. The quicker you can identify and respond to security threats, the better you can protect your data. The most useful technique is to implement a Security Information and Event Management (SIEM) solution. Which can collect and analyse security logs from your cloud environment to identify suspicious activity and potential security threats. 

7. Implement Access Controls

Enforce strict access controls to limit who can access your data at all levels, including the application level, network level, and user level. Use identity and access management (IAM) tools to grant the minimum necessary permissions to users and applications. Implement role-based access controls (RBAC) to restrict permissions based on job roles. Regularly review and update permissions as roles within your organization change. It is recommended to adapt and implement the Zero Trust Security model in your organization affectively.  

8. Back Up Data Regularly

Back Up Data Regularly

Data loss can occur due to various reasons, including accidental deletion, hardware failures, natural calamity, or cyber-attacks. Regularly back up your data to a separate, secure location and test the restoration process to ensure quick recovery in the event of a data loss incident. It is recommended to keep at least one set of backups outside of the current cloud infrastructure to ensure that they are not affected by a cloud outage or security breach. 

9. Using of VPN and Firewall

A VPN (Virtual Private Network) is a technology that creates a secure and encrypted connection between a device and a network. A VPN can help businesses protect their data in the cloud by hiding their IP address, encrypting their traffic, and preventing eavesdropping and interception by third parties. Businesses should use a VPN when accessing their cloud services, especially from public or untrusted networks, such as Wi-Fi hotspots. 

A firewall is a device or software that monitors and controls the incoming and outgoing network traffic, based on predefined rules. A firewall can help businesses protect their data in the cloud by blocking or allowing specific types of traffic, such as ports, protocols, or IP addresses. Businesses should use a firewall to secure their network perimeter and configure it according to their security needs and objectives

10. Have an Incident Response Plan

Response Plan

Even with the best security measures in place, there is always the risk of a security incident. Develop a comprehensive incident response plan that outlines the steps to take in the event of a security breach. This plan should include communication strategies, legal considerations, and a clear roadmap for restoring normal operations along with containment, eradication, and recovery.

11. Conduct Regular Security Training

Security Training

Employees are often the weakest link in the security chain, as they may fall victim to phishing, malware, or social engineering attacks, or may unintentionally expose or compromise data in the cloud. Businesses should educate their employees about the security risks and best practices of cloud computing and provide them with clear and enforceable policies and guidelines. They should also train their employees on how to detect and report suspicious or malicious activities, and how to use the cloud services securely and responsibly. The best practice is to conduct Attack Simulation Training to employees periodically. 

Conclusion

Securing data in cloud services requires a proactive and multifaceted approach. By adhering to these recommended practices, organizations can fortify their security framework, safeguard critical data, and guarantee the resilience of their cloud infrastructure. The threat landscape is ever evolving, and staying informed is crucial. Regularly update your knowledge about emerging cybersecurity threats and trends. Embrace the power of the cloud securely and navigate the digital landscape with confidence. 

The post 11 Best Practices for Securing Data in the Cloud appeared first on Netwoven.

]]>
https://netwoven.com/cloud-infrastructure-and-security/best-practices-for-securing-data/feed/ 0
Thinking of Data Democratization? Microsoft Fabric Will Help You Adopt Data Mesh Culture  https://netwoven.com/data-engineering-and-analytics/data-democratization-with-microsoft-fabric-data-mesh/ https://netwoven.com/data-engineering-and-analytics/data-democratization-with-microsoft-fabric-data-mesh/#respond Tue, 14 Nov 2023 13:47:54 +0000 https://netwoven.com/?p=47902 Introduction Like hundreds of leading enterprises, you probably have realized that treating data as assets, closely managed by a highly specialized central team is creating huge bottleneck for actual data… Continue reading Thinking of Data Democratization? Microsoft Fabric Will Help You Adopt Data Mesh Culture 

The post Thinking of Data Democratization? Microsoft Fabric Will Help You Adopt Data Mesh Culture  appeared first on Netwoven.

]]>
Introduction

Like hundreds of leading enterprises, you probably have realized that treating data as assets, closely managed by a highly specialized central team is creating huge bottleneck for actual data owners and consumers.

Data Mesh Culture 

If you are still continuing with a centralized data culture, you might be feeling constrained trying to scale appropriately to accommodate huge influx of data volume, diversity and demand for data driven insight.

If you have reached here from Google search, you must already be contemplating seriously a transition to a culture where data should be managed in the domain, that owns the data and domains are responsible for sharing data with others inside or outside the organization.

Why monolithic data architecture is not supporting big enterprises anymore?

Monolithic data architectures like data warehouses and data lakes, were designed with the concept of storing and serving an organization’s vast amount of operational data in one centralized location. The thought process was that the specialized data team would ensure the data is clean, accurate and properly formatted. The data consumers are expected to be served with high quality contextual data for wide range of analytical use cases. However, in reality, this is not always the case. While centralization strategies have witnessed initial success for organizations with smaller data volume and fewer data consumers, with the increase of data sources and consumers, it started to develop bottlenecks.    

As enterprises grow, their data requirements become more complex. Monolithic architectures are often difficult to scale horizontally to meet increased data volumes and processing demands. This can lead to performance bottlenecks and limit an organization’s ability to handle big data effectively. As enterprises grow, their data requirements become more complex.

Monolithic architectures are often difficult to scale horizontally to meet increased data volumes and processing demands. .This can lead to performance bottlenecks and limit an organization’s ability to handle big data effectively.

data architecture

Over 80% of enterprise data remains as Dark Data. This data does not help organization with any insight to make any business decision.

Modern enterprises deal with a wide variety of data types, including structured, semi-structured, and unstructured data. Monolithic architectures are typically optimized for handling structured data, making it challenging to efficiently process and analyze diverse data sources. Different business units of an enterprise might have completely different data needs in terms of source, data types and processing logic. Accommodating all of these diverse requests has become really challenging for a central team both in terms of domain knowledge and technology involved. This results in mounting frustrations among data owners and consumers and much of data may not even be referred to the central data team. As a result, a lion’s share of enterprise data remains unexplored – referred as Dark Data.

We can summarize the challenges of enterprises having a monolithic data architecture as below:

  • Disconnects data and data owners (product/service experts).
  • Data processing architecture is not in alignment business axis of change.
  • Tight coupling between stages of data processing pipeline impacts flexibility that business needs.
  • Creates highly specialized and isolated engineering team
  • Creates backlogs focused on technical not on business functional changes.

How Data Mesh i.e., domain centric federated data management is helping big enterprises?

Data Mesh is founded on four principles:
Domain Oriented Ownership

Data is owned and managed by the teams or business units (domains) that generate and use it. This aligns data responsibility with the domain’s expertise, making it more manageable and relevant to its specific context.

Data as a Product

Data is treated as a product rather than a byproduct of business processes. Each domain is responsible for creating and maintaining its own data as a product, which is designed to meet the needs of the domain’s consumers. Data products include well-defined data sets, APIs, and documentation.

Self-serve Data Platform

Development of self-serve data infrastructure that enables easy and secure access to data for data consumers. This infrastructure includes data catalogs, data discovery tools, and standardized interfaces for accessing data products.

Federated Computational Governance

A structured approach to governance within a data mesh. It ensures that domains maintain a degree of independence while adhering to essential governance standards, fostering interoperability, and enabling the organization to leverage the full potential of its data ecosystem. This approach relies on the fine-grained codification and automation of policies to streamline governance activities and reduce overhead.

The basic concept of Data Mesh is simple – help data owners like functional or business units, manage their own data as they understand their data best. Removing dependency on a central data team, domains can enjoy autonomy and can scale as needed.

Along with the independence comes accountability as a single team is responsible for the data from production to consumption. This encourages domain teams to take responsibility for the quality, accuracy, and accessibility of their data. This, in turn, can lead to better data governance and more reliable data.

As domains take the ownership of data, they ensure that potential consumers of their data, be it other domains within the organization or external consumers, can easily discover, trust and access data. Also, they ensure  published dataset follow certain standards in terms of schema and metadata so that data can be interoperable with another dataset. This is where product mindset becomes relevant, and data is managed and published as product. This makes it easier for data consumers to find the data they need, fostering self-service analytics and reducing the time and effort required to locate relevant information.

Domain-centric architectures are designed to scale with an enterprise’s growing data needs. When new domains or data sources are added, they can be integrated without significantly affecting existing domains. This flexibility allows organizations to adapt to changing business requirements and incorporate new data sources and technologies more easily.

Data silos are a common problem in large enterprises. A domain-centric approach helps break down these silos by promoting collaboration and data sharing between different parts of the organization. Domains can act as data product teams, providing standardized, well-documented data interfaces for others to use.

Domain-centric architectures encourage smaller, more focused teams to develop and manage data products. This can lead to quicker development cycles, faster iterations, and the ability to innovate more rapidly. It also reduces the risk of bottlenecks in data delivery. Also, because of the ownership, teams are more likely to ensure data quality and consistency within their specific domains. This can result in better data reliability and trustworthiness across the enterprise.

What kind of challenges enterprises face while adopting domain centric data management?

Analysis of experiences of some enterprises, challenges companies face adopting domain centric federated model can be classified mainly under three categories:

  • Management acceptance
  • Dealing with cultural shift
  • Governance challenges

We need to accept this is a significant change that needs support from top management in pushing down the change through ranks of organization. The main challenge is structural. For decades industries are habituated in dealing data in centralized manner. Organization roles and responsibilities got defined accordingly. Changing over to data federation model, threatens to significantly alter scope of those roles. Along with that, domains need to be equipped with skills, infrastructure and controls to perform data processing and management on their own.

While each domain has autonomy, there’s a need for consistent data governance across domains. Coordinating and enforcing data governance policies at the enterprise level while allowing autonomy at the domain level can be a delicate balance. This applies to both consistent data quality and interoperability of data produced by different domains. Ensuring that each domain adheres to the enterprise’s data quality standards while allowing flexibility for domain-specific requirements is a delicate task. Similarly, ensuring each domain adhere to similar data formats, schemas, or data processing technologies is crucial.

To overcome these challenges, a dedicated team empowered by top management’s commitment need to work in phases with different domains, beginning with few pilot transitions. Doing pilot programs with most willing groups increases the chance of early success and helps wins trust of others across the organization.

ebook - Treat Your Data as Product with Microsoft Fabric
Treat Your Data as Product with Microsoft Fabric

Welcome to a new era of data management and transformation! Our latest ebook, “Treat Your Data as Product with Microsoft Fabric”, is your definitive guide to revolutionizing the way you perceive, manage, and leverage your data assets.

Get the eBook

Bonus Read: Data Fabric vs. Data Mesh

Microsoft Fabric will help ease transition challenges

Microsoft Fabric has been designed to support organizations adopt the domain centric data culture in a streamlined manner. As Data Mesh is a socio-technical endeavor, organizations need to gear themselves up for the change, while Fabric can largely address the technical aspects.

Microsoft Fabric consolidates many technologies under one platform to offer end to end data management with great flexibility to accommodate diversity in organizational culture, structure, and processes. Four principles of Data Mesh can be mapped to one or more of Fabric components.

Data Mesh Workflow

Data Mesh architecture empowers business units to structure their data as per their specific need. Organizations can define data-boundaries and consumption pattern as per business needs. Fabric allows define domains for organizations that map to business units and associating workspaces to each domain. Data artifacts like lakehouse, pipeline, notebook, etc can be created within workspaces. Federated governance can be applied to granular level through domains and workspaces.

Fabric design has also implemented framework to support dataset as product, which is a major recommendation of Data Mesh. Any dataset that is promoted from any workspace, will be listed by OneLake as available dataset and becomes discoverable.  Listed dataset is published with metadata that helps consumers with information to get enough details about the dataset. Workspace owner can also mark dataset as certified, which makes the dataset trustworthy for consumers. Dataset can be published through multiple endpoints to facilitate access through native tools and methods. While doing so, specific addresses for these endpoints are also published through OneLake. This satisfies Data Product characteristics like addressable and natively accessible. Microsoft Purview is now integrated with Fabric to further bolster data discovery. So, using Fabric, organizations get the right technology support to package their datasets as product while sharing inside or outside the organization.

Fabric makes it easy for data owners to perform most of the data processing activities themselves using OneLake interface. Very little core and complex data processing and transformation skills are needed to perform such activities in Fabric. This enables a large section of business users who are actual data owners and/or consumers to do self-service on their data requirements.

In terms of Federated Governance, another principle of Data Mesh, Fabric is making significate stride in this direction. Microsoft Purview is a unified data governance service that is integrated with Fabric to enable a foundational understanding of your OneLake. Purview helps with automatic data-discovery, sensitive data classification, end-to-end data lineage and enable data consumers to access valuable, trustworthy data.

Don’t miss this opportunity to expand your knowledge and harness the full potential of Microsoft Fabric. Click here to watch our webinar.

What is your next step towards data democratization?

By now, you must agree that data democratization is a must for your organization if you envision continued growth. You do not want a severe bottleneck in the form of a centralized data architecture supported by a highly specialized team. You want to provide your business units more data autonomy so they can scale as business demands.

You also have understood that the transition is not easy as this is a techno-socio change. You do need support of a good technology platform as well as a good partner with experience of driving similar transition with other enterprises. Microsoft Fabric has a great promise to be one such platform. You may refer the eBook – Deliver your data as product for consumers using Microsoft Fabric, on detailed transition steps, challenges and Fabric features that helps you implementing Data Mesh principles for your organization.

For a workshop on your prospect of transition to Data Democratization, please contact Netwoven Inc.

The post Thinking of Data Democratization? Microsoft Fabric Will Help You Adopt Data Mesh Culture  appeared first on Netwoven.

]]>
https://netwoven.com/data-engineering-and-analytics/data-democratization-with-microsoft-fabric-data-mesh/feed/ 0
How Microsoft Purview Facilitates Regulatory Compliance for Semiconductor Manufacturing Industry  https://netwoven.com/cloud-infrastructure-and-security/microsoft-purview-facilitates-regulatory-compliance/ https://netwoven.com/cloud-infrastructure-and-security/microsoft-purview-facilitates-regulatory-compliance/#respond Thu, 26 Oct 2023 14:29:52 +0000 https://netwoven.com/?p=47746 Introduction The semiconductor manufacturing industry is the backbone of the digital global economy. In most cases the data generated in this industry is highly sensitive and if obtained by competitors… Continue reading How Microsoft Purview Facilitates Regulatory Compliance for Semiconductor Manufacturing Industry 

The post How Microsoft Purview Facilitates Regulatory Compliance for Semiconductor Manufacturing Industry  appeared first on Netwoven.

]]>
Introduction

The semiconductor manufacturing industry is the backbone of the digital global economy. In most cases the data generated in this industry is highly sensitive and if obtained by competitors and bad actors including adversary nations can cause grave damage to the reputation and investments in innovation done by the company.  

In addition to the economic impact, data breaches in the semiconductor manufacturing industry could also have national security implications. This is because semiconductors are used in many critical infrastructure systems, such as telecommunications, transportation, and power grids.  

As per latest report:

In 2022, the CHIPS act was put in place to ensure the United States has secure and reliable access to supply of semiconductors.  To comply with the CHIPS Act, and other data security regulations, semiconductor manufacturing companies need to take several steps, including: 

  • Implementing strong access controls to protect sensitive data.
  • Using encryption to protect data in transit and at rest. 
  • Conducting regular security assessments. 
  • Training employees on data security best practices. 

We worked on a project to assess the sensitive content shared by several applications, identify risks, define policies and procedures, and implement a solution to mitigate the risks. The focus of the discussion today is to highlight how such a project may be undertaken and a step-by-step approach be followed to yield comprehensive results.


Ebook: 7 Steps to building a Compliance Based Organization with Microsoft Purview Solutions
Ebook: 7 Steps to building a Compliance Based Organization with Microsoft Purview Solutions

This eBook offers a detailed overview of the regulatory landscape, emphasizing the importance of compliance. It discusses common compliance challenges and explains how to implement and use Microsoft Purview to meet regulatory requirements efficiently.

Get the eBook

What are the Steps for implementing a sensitive data compliance project?

The goal is to identify, classify the sensitive information across the organization and to ensure that the data shared internally and externally was secure at rest as well as in transit.

How Microsoft Purview Facilitates Regulatory Compliance for Semiconductor Manufacturing Industry
1. Risk Assessment

The first step was to identify the sensitive data that is generated and used in the semiconductor manufacturing process. The data targeted for the assessment was related to drawings and specification documents created by the engineering department. The data repositories were identified, and the data storage and security processes were documented.

2. Policies and procedures

Once the sensitive data was identified, Netwoven worked with the client to develop policies and procedures for protecting the drawings and specification documents. The policies and procedures addressed the data classification, security, storage, backup, and encryption of data assets. Some of the policies we developed were to limit access to sensitive data only to the application accounts, applying encryption to the documents at rest and in motion, providing encrypted documents to external and internal users and retracting the access to the documents as needed. Some of the procedures we developed were onboarding external users, content marking of sensitive documents, RBAC on sensitive documents, employee and external user training requirements to handle sensitive documents, etc.       

3. Implementation

Netwoven built the solution to protect the drawings and specification documents shared from 10 different applications with internal and external users. Based on the new procedures defined, Netwoven built automation for protecting the documents based on the metadata (ex. File type, Sensitivity classification, Visibility level etc.) provided by the source systems. Protecting the sensitive data for external users (suppliers and customers) was a challenge that required Netwoven to build a tiered application to manage the access controls for the external parties. 

4. Training

Netwoven built training material to train client employees on the data compliance policies and procedures. This training covered the importance of data security, the risks of data breaches, and the consequences of non-compliance. Netwoven built a self-help portal that documented the FAQs, short videos on how-to work on a particular topic and store training documents for easy access.

5. Monitoring

Netwoven built several reporting solutions to collect, refine and build Dashboards based on the compliance log data collected by the tools. Some of the reports we developed were to show the document encryption progress, document access reports, Vulnerability assessment reports. Security Incident reports etc.

What are the tools and technologies used?

The solution was built using Microsoft Purview compliance tools including Sensitivity labels and encryption, Azure File shares and SharePoint Libraries to store sensitive data, Azure Synapse Analytics to move and run workflows on the content released by source systems and Azure functions to properly secure the Microsoft Purview application.

Conclusion

The aim of this article was to share our experience and the methodology we followed for a successful data protection and compliance project implemented in the semiconductor manufacturing environment. The nuances will lie in the correct identification and classification of sensitive data to start with. One needs to be particularly mindful about the usable labeling scheme, policies, and procedures without being disruptive to the business processes at work. The other aspects of adoption, governance, compliance, and reporting need to be in place hand in hand. The comprehensiveness is the key and I hope the discussion helps.   

The post How Microsoft Purview Facilitates Regulatory Compliance for Semiconductor Manufacturing Industry  appeared first on Netwoven.

]]>
https://netwoven.com/cloud-infrastructure-and-security/microsoft-purview-facilitates-regulatory-compliance/feed/ 0