Data Mining Training Courses

Data Mining Training

Data Mining refers to the process of automatically searching large data sets to discover patterns and useful information.

NobleProg onsite live Data Mining training courses demonstrate through hands-on practice the fundamentals of Data Mining, its sources of methods including Artificial intelligence, Machine learning, Statistics and Database systems, and its use and applications.

Data Mining training is available in various formats, including onsite live training and live instructor-led training using an interactive, remote desktop setup. Local Data Mining training can be carried out live on customer premises or in NobleProg local training centers.

Client Testimonials

Subcategories

Data Mining Course Outlines

Code Name Duration Overview
osqlide Oracle SQL Intermediate - Data Extraction 14 hours
TalendDI Talend Open Studio for Data Integration 28 hours Talend Open Studio for Data Integration is an open-source data integration product used to combine, convert and update data in various locations across a business. In this instructor-led, live training, participants will learn how to use the Talend ETL tool to carry out data transformation, data extraction, and connectivity with Hadoop, Hive, and Pig.   By the end of this training, participants will be able to Explain the concepts behind ETL (Extract, Transform, Load) and propagation Define ETL methods and ETL tools to connect with Hadoop Efficiently amass, retrieve, digest, consume, transform and shape big data in accordance to business requirements Upload to and extract large records from Hadoop, Hive, and NoSQL databases Audience Business intelligence professionals Project managers Database professionals SQL Developers ETL Developers Solution architects Data architects Data warehousing professionals System administrators and integrators Format of the course Part lecture, part discussion, exercises and heavy hands-on practice
dmmlr Data Mining & Machine Learning with R 14 hours R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
PentahoDI Pentaho Data Integration Fundamentals 21 hours Pentaho Data Integration is an open-source data integration tool for defining jobs and data transformations. In this instructor-led, live training, participants will learn how to use Pentaho Data Integration's powerful ETL capabilities and rich GUI to manage an entire big data lifecycle, maximizing the value of data to the organization. By the end of this training, participants will be able to: Create, preview, and run basic data transformations containing steps and hops Configure and secure the Pentaho Enterprise Repository Harness disparate sources of data and generate a single, unified version of the truth in an analytics-ready format. Provide results to third-part applications for further processing Audience Data Analyst ETL developers Format of the course Part lecture, part discussion, exercises and heavy hands-on practice
rprogda R Programming for Data Analysis 14 hours This course is part of the Data Scientist skill set (Domain: Data and Technology)
datavault Data Vault: Building a Scalable Data Warehouse 28 hours Data vault modeling is a database modeling technique that provides long-term historical storage of data that originates from multiple sources. A data vault stores a single version of the facts, or "all the data, all of the time". Its flexible, scalable, consistent and adaptable design encompasses the best aspects of 3rd normal form (3NF) and star schema. In this instructor-led, live training, participants will learn how to build a Data Vault. By the end of this training, participants will be able to: Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI. Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse Develop a consistent and repeatable ETL (Extract, Transform, Load) process Build and deploy highly scalable and repeatable warehouses Audience Data modelers Data warehousing specialist Business Intelligence specialists Data engineers Database administrators Format of the course Part lecture, part discussion, exercises and heavy hands-on practice
sspsspas Statistics with SPSS Predictive Analytics Software 14 hours Goal: Learning to work with SPSS at the level of independence The addressees: Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and learn popular data mining techniques.
bigddbsysfun Big Data & Database Systems Fundamentals 14 hours The course is part of the Data Scientist skill set (Domain: Data and Technology).
monetdb MonetDB 28 hours MonetDB is an open-source database that pioneered the column-store technology approach. In this instructor-led, live training, participants will learn how to use MonetDB and how to get the most value out of it. By the end of this training, participants will be able to: Understand MonetDB and its features Install and get started with MonetDB Explore and perform different functions and tasks in MonetDB Accelerate the delivery of their project by maximizing MonetDB capabilities Audience Developers Technical experts Format of the course Part lecture, part discussion, exercises and heavy hands-on practice
mdlmrah Model MapReduce and Apache Hadoop 14 hours The course is intended for IT specialist that works with the distributed processing of large data sets across clusters of computers.
datavis1 Data Visualization 28 hours This course is intended for engineers and decision makers working in data mining and knoweldge discovery. You will learn how to create effective plots and ways to present and represent your data in a way that will appeal to the decision makers and help them to understand hidden information.
danagr Data and Analytics - from the ground up 42 hours Data analytics is a crucial tool in business today. We will focus throughout on developing skills for practical hands on data analysis. The aim is to help delegates to give evidence-based answers to questions:  What has happened? processing and analyzing data producing informative data visualizations What will happen? forecasting future performance evaluating forecasts What should happen? turning data into evidence-based business decisions optimizing processes The course itself can be delivered either as a 6 day classroom course or remotely over a period of weeks if preferred. We can work with you to deliver the course to best suit your needs.
bdbiga Big Data Business Intelligence for Govt. Agencies 35 hours Advances in technologies and the increasing amount of information are transforming how business is conducted in many industries, including government. Government data generation and digital archiving rates are on the rise due to the rapid growth of mobile devices and applications, smart sensors and devices, cloud computing solutions, and citizen-facing portals. As digital information expands and becomes more complex, information management, processing, storage, security, and disposition become more complex as well. New capture, search, discovery, and analysis tools are helping organizations gain insights from their unstructured data. The government market is at a tipping point, realizing that information is a strategic asset, and government needs to protect, leverage, and analyze both structured and unstructured information to better serve and meet mission requirements. As government leaders strive to evolve data-driven organizations to successfully accomplish mission, they are laying the groundwork to correlate dependencies across events, people, processes, and information. High-value government solutions will be created from a mashup of the most disruptive technologies: Mobile devices and applications Cloud services Social business technologies and networking Big Data and analytics IDC predicts that by 2020, the IT industry will reach $5 trillion, approximately $1.7 trillion larger than today, and that 80% of the industry's growth will be driven by these 3rd Platform technologies. In the long term, these technologies will be key tools for dealing with the complexity of increased digital information. Big Data is one of the intelligent industry solutions and allows government to make better decisions by taking action based on patterns revealed by analyzing large volumes of data — related and unrelated, structured and unstructured. But accomplishing these feats takes far more than simply accumulating massive quantities of data.“Making sense of thesevolumes of Big Datarequires cutting-edge tools and technologies that can analyze and extract useful knowledge from vast and diverse streams of information,” Tom Kalil and Fen Zhao of the White House Office of Science and Technology Policy wrote in a post on the OSTP Blog. The White House took a step toward helping agencies find these technologies when it established the National Big Data Research and Development Initiative in 2012. The initiative included more than $200 million to make the most of the explosion of Big Data and the tools needed to analyze it. The challenges that Big Data poses are nearly as daunting as its promise is encouraging. Storing data efficiently is one of these challenges. As always, budgets are tight, so agencies must minimize the per-megabyte price of storage and keep the data within easy access so that users can get it when they want it and how they need it. Backing up massive quantities of data heightens the challenge. Analyzing the data effectively is another major challenge. Many agencies employ commercial tools that enable them to sift through the mountains of data, spotting trends that can help them operate more efficiently. (A recent study by MeriTalk found that federal IT executives think Big Data could help agencies save more than $500 billion while also fulfilling mission objectives.). Custom-developed Big Data tools also are allowing agencies to address the need to analyze their data. For example, the Oak Ridge National Laboratory’s Computational Data Analytics Group has made its Piranha data analytics system available to other agencies. The system has helped medical researchers find a link that can alert doctors to aortic aneurysms before they strike. It’s also used for more mundane tasks, such as sifting through résumés to connect job candidates with hiring managers.
dsbda Data Science for Big Data Analytics 35 hours Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.
d2dbdpa From Data to Decision with Big Data and Predictive Analytics 21 hours Audience If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you. It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing. It is not aimed at people configuring the solution, those people will benefit from the big picture though. Delivery Mode During the course delegates will be presented with working examples of mostly open source technologies. Short lectures will be followed by presentation and simple exercises by the participants Content and Software used All software used is updated each time the course is run so we check the newest versions possible. It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning.
datapro Data Protection 35 hours This is an Instructor led course, and is the non-certification version of the "CDP - Certificate in Data Protection" course Those experienced in data protection issues, as well as those new to the subject, need to be trained so that their organisations are confident that legal compliance is continually addressed. It is necessary to identify issues requiring expert data protection advice in good time in order that organisational reputation and credibility are enhanced through relevant data protection policies and procedures. Objectives: The aim of the syllabus is to promote an understanding of how the data protection principles work rather than simply focusing on the mechanics of regulation. The syllabus places the Act in the context of human rights and promotes good practice within organisations. On completion you will have: an appreciation of the broader context of the Act.  an understanding of the way in which the Act and the Privacy and Electronic Communications (EC Directive) Regulations 2003 work a broad understanding of the way associated legislation relates to the Act an understanding of what has to be done to achieve compliance Course Synopsis: The syllabus comprises three main parts, each sub-sections. Context - this will address the origins of and reasons for the Act together with consideration of privacy in general. Law – Data Protection Act - this will address the main concepts and elements of the Act and subordinate legislation. Application - this will consider how compliance is achieved and how the Act works in practice.
psr Introduction to Recommendation Systems 7 hours Audience Marketing department employees, IT strategists and other people involved in decisions related to the design and implementation of recommender systems. Format Short theoretical background follow by analysing working examples and short, simple exercises.
neo4j Beyond the relational database: neo4j 21 hours Relational, table-based databases such as Oracle and MySQL have long been the standard for organizing and storing data. However, the growing size and fluidity of data have made it difficult for these traditional systems to efficiently execute highly complex queries on the data. Imagine replacing rows-and-columns-based data storage with object-based data storage, whereby entities (e.g., a person) could be stored as data nodes, then easily queried on the basis of their vast, multi-linear relationship with other nodes. And imagine querying these connections and their associated objects and properties using a compact syntax, up to 20 times lighter than SQL. This is what graph databases, such as neo4j offer. In this hands-on course, we will set up a live project and put into practice the skills to model, manage and access your data. We contrast and compare graph databases with SQL-based databases as well as other NoSQL databases and clarify when and where it makes sense to implement each within your infrastructure. Audience Database administrators (DBAs) Data analysts Developers System Administrators DevOps engineers Business Analysts CTOs CIOs Format of the course Heavy emphasis on hands-on practice. Most of the concepts are learned through samples, exercises and hands-on development.
68780 Apache Spark 14 hours
processmining Process Mining 21 hours Process mining, or Automated Business Process Discovery (ABPD), is a technique that applies algorithms to event logs for the purpose of analyzing business processes. Process mining goes beyond data storage and data analysis; it bridges data with processes and provides insights into the trends and patterns that affect process efficiency.  Format of the course     The course starts with an overview of the most commonly used techniques for process mining. We discuss the various process discovery algorithms and tools used for discovering and modeling processes based on raw event data. Real-life case studies are examined and data sets are analyzed using the ProM open-source framework. Audience     Data science professionals     Anyone interested in understanding and applying process modeling and data mining
pmml Predictive Models with PMML 7 hours The course is created to scientific, developers, analysts or any other people who want to standardize or exchange their models with Predictive Model Markup Language (PMML) file format.
kdd Knowledge Discover in Databases (KDD) 21 hours Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing. In this course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes. Audience     Data analysts or anyone interested in learning how to interpret data to solve problems Format of the course     After a theoretical discussion of KDD, the instructor will present real-life cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations.
dataminr Data Mining with R 14 hours R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.
druid Druid: Build a fast, real-time data analysis system 21 hours Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo. In this course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment. Audience     Application developers     Software engineers     Technical consultants     DevOps professionals     Architecture engineers Format of the course     Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding
datamin Data Mining 21 hours Course can be provided with any tools, including free open-source data mining software and applications
BigData_ A practical introduction to Data Analysis and Big Data 35 hours Participants who complete this training will gain a practical, real-world understanding of Big Data and its related technologies, methodologies and tools. Participants will have the opportunity to put this knowledge into practice through hands-on exercises. Group interaction and instructor feedback make up an important component of the class. The course starts with an introduction to elemental concepts of Big Data, then progresses into the programming languages and methodologies used to perform Data Analysis. Finally, we discuss the tools and infrastructure that enable Big Data storage, Distributed Processing, and Scalability. Audience Developers / programmers IT consultants Format of the course Part lecture, part discussion, hands-on practice and implementation, occasional quizing to measure progress.
datashrinkgov Data Shrinkage for Government 14 hours
ApHadm1 Apache Hadoop: Manipulation and Transformation of Data Performance 21 hours This course is intended for developers, architects, data scientists or any profile that requires access to data either intensively or on a regular basis. The major focus of the course is data manipulation and transformation. Among the tools in the Hadoop ecosystem this course includes the use of Pig and Hive both of which are heavily used for data transformation and manipulation. This training also addresses performance metrics and performance optimisation. The course is entirely hands on and is punctuated by presentations of the theoretical aspects.
matlab2 MATLAB Fundamentals 21 hours This three-day course provides a comprehensive introduction to the MATLAB technical computing environment. The course is intended for beginning users and those looking for a review. No prior programming experience or knowledge of MATLAB is assumed. Themes of data analysis, visualization, modeling, and programming are explored throughout the course. Topics include:     Working with the MATLAB user interface     Entering commands and creating variables     Analyzing vectors and matrices     Visualizing vector and matrix data     Working with data files     Working with data types     Automating commands with scripts     Writing programs with logic and flow control     Writing functions
scilab Scilab 14 hours Scilab is a well-developed, free, and open-source high-level language for scientific data manipulation. Used for statistics, graphics and animation, simulation, signal processing, physics, optimization, and more, its central data structure is the matrix, simplifying many types of problems compared to alternatives such as FORTRAN and C derivatives. It is compatible with languages such as C, Java, and Python, making it suitable as for use as a supplement to existing systems. In this instructor-led training, participants will learn the advantages of Scilab compared to alternatives like Matlab, the basics of the Scilab syntax as well as some advanced functions, and interface with other widely used languages, depending on demand. The course will conclude with a brief project focusing on image processing. By the end of this training, participants will have a grasp of the basic functions and some advanced functions of Scilab, and have the resources to continue expanding their knowledge. Audience Data scientists and engineers, especially with interest in image processing and facial recognition Format of the course Part lecture, part discussion, exercises and intensive hands-on practice, with a final project
datama Data Mining and Analysis 28 hours Objective: Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive results.
matlabfundamentalsfinance MATLAB Fundamentals + MATLAB for Finance 35 hours This course provides a comprehensive introduction to the MATLAB technical computing environment + an introduction to using MATLAB for financial applications. The course is intended for beginning users and those looking for a review. No prior programming experience or knowledge of MATLAB is assumed. Themes of data analysis, visualization, modeling, and programming are explored throughout the course. Topics include: Working with the MATLAB user interface Entering commands and creating variables Analyzing vectors and matrices Visualizing vector and matrix data Working with data files Working with data types Automating commands with scripts Writing programs with logic and flow control Writing functions Using the Financial Toolbox for quantitative analysis
rintrob Introductory R for Biologists 28 hours R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has also found followers among statisticians, engineers and scientists without computer programming skills who find it easy to use. Its popularity is due to the increasing use of data mining for various goals such as set ad prices, find new drugs more quickly or fine-tune financial models. R has a wide variety of packages for data mining.
matlabdsandreporting MATLAB Fundamentals, Data Science & Report Generation 126 hours In the first part of this training, we cover the fundamentals of MATLAB and its function as both a language and a platform.  Included in this discussion is an introduction to MATLAB syntax, arrays and matrices, data visualization, script development, and object-oriented principles. In the second part, we demonstrate how to use MATLAB for data mining, machine learning and predictive analytics. To provide participants with a clear and practical perspective of MATLAB's approach and power, we draw comparisons between using MATLAB and using other tools such as spreadsheets, C, C++, and Visual Basic. In the third part of the training, participants learn how to streamline their work by automating their data processing and report generation. Throughout the course, participants will put into practice the ideas learned through hands-on exercises in a lab environment. By the end of the training, participants will have a thorough grasp of MATLAB's capabilities and will be able to employ it for solving real-world data science problems as well as for streamlining their work through automation. Assessments will be conducted throughout the course to gauge progress. Format of the course Course includes theoretical and practical exercises, including case discussions, sample code inspection, and hands-on implementation. Note Practice sessions will be based on pre-arranged sample data report templates. If you have specific requirements, please contact us to arrange.

Other regions

Weekend Data Mining courses, Evening Data Mining training, Data Mining boot camp, Data Mining instructor-led , Data Mining one on one training ,Weekend Data Mining training, Data Mining instructor, Data Mining on-site, Data Mining training courses, Data Mining trainer , Evening Data Mining courses, Data Mining classes, Data Mining private courses

Course Discounts Newsletter

We respect the privacy of your email address. We will not pass on or sell your address to others.
You can always change your preferences or unsubscribe completely.

Some of our clients

Outlines Extract
Machine-generated

Data mining system exercise the use of engineers and activities in the course provided to include a component of content types of the interfaces and training the management and the compliance and d. Process in the components conditional developers settings for a development of the test data controls the development the process and application and services security and the course ana. Data reports in the server models and explain the business programming and management to as a security server 10 model and management of services processing console and analysis. Requirements setting of the development of the demoting client data programming and the control in a subject lab : data creating the goals of the control and practices and design of the. The service to real contract processes control of real in the structure of the course the list of the capability to the configure a marketing the basic configuring service in the course.