Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Welcome to your Information Engineering (Software Engineering pathway only) reading list! Here you will find the resources to support you throughout your module.
Data Smart by Data Science gets thrown around in the press like it's magic. Major retailers are predicting everything from when their customers are pregnant to when they want a new pair of Chuck Taylors. It's a brave new world where seemingly meaningless data can be transformed into valuable insight to drive smart business decisions. But how does one exactly do data science? Do you have to hire one of these priests of the dark arts, the "data scientist," to extract this gold from your data? Nope. Data science is little more than using straight-forward steps to process raw data into actionable insight. And in Data Smart, author and data scientist John Foreman will show you how that's done within the familiar environment of a spreadsheet. Why a spreadsheet? It's comfortable! You get to look at the data every step of the way, building confidence as you learn the tricks of the trade. Plus, spreadsheets are a vendor-neutral place to learn data science without the hype. But don't let the Excel sheets fool you. This is a book for those serious about learning the analytic techniques, the math and the magic, behind big data. Each chapter will cover a different technique in a spreadsheet so you can follow along: Mathematical optimization, including non-linear programming and genetic algorithms Clustering via k-means, spherical k-means, and graph modularity Data mining in graphs, such as outlier detection Supervised AI through logistic regression, ensemble models, and bag-of-words models Forecasting, seasonal adjustments, and prediction intervals through monte carlo simulation Moving from spreadsheets into the R programming language You get your hands dirty as you work alongside John through each technique. But never fear, the topics are readily applicable and the author laces humor throughout. You'll even learn what a dead squirrel has to do with optimization modeling, which you no doubt are dying to know.
Call Number: eBook
Publication Date: 2013
The Art of R Programming by R is the world’s most popular programming language for statistical computing. Drug developers use it to evaluate clinical trials and determine which medications are safe and effective; archaeologists use it to sift through mounds of artifacts and track the spread of ancient civilizations; and actuaries use it to assess financial risks and keep economies running smoothly. In The Art of R Programming, veteran author Norman Matloff takes readers on a guided tour of this powerful language, from basic object types and data structures to graphing, parallel processing, and much more. Along the way, readers learn about topics including functional and object-oriented programming, low-level code optimization, and interfacing R with C++ and Python. Whether readers are doing academic research, designing aircraft, or forecasting the weather, R is the tool of choice for statistical application development, and The Art of R Programming is the definitive guide to learning R.
Call Number: eBook
Publication Date: 2011
Hadoop by Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you'll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. Using Hadoop 2 exclusively, author Tom White presents new chapters on YARN and several Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. You'll learn about recent changes to Hadoop, and explore new case studies on Hadoop's role in healthcare systems and genomics data processing. Learn fundamental components such as MapReduce, HDFS, and YARN Explore MapReduce in depth, including steps for developing applications with it Set up and maintain a Hadoop cluster running HDFS and MapReduce on YARN Learn two data formats: Avro for data serialization and Parquet for nested data Use data ingestion tools such as Flume (for streaming data) and Sqoop (for bulk data transfer) Understand how high-level data processing tools like Pig, Hive, Crunch, and Spark work with Hadoop Learn the HBase distributed database and the ZooKeeper distributed configuration service
Call Number: eBook
Publication Date: 2015
Software for Data Analysis by John Chambers turns his attention to R, the enormously successful open-source system based on the S language. His book guides the reader through programming with R, beginning with simple interactive use and progressing by gradual stages, starting with simple functions. More advanced programming techniques can be added as needed, allowing users to grow into software contributors, benefiting their careers and the community. R packages provide a powerful mechanism for contributions to be organized and communicated. This is the only advanced programming book on R, written by the author of the S language from which R evolved.
Call Number: eBook
Publication Date: 2009
Mining of Massive Data Sets by Written by leading authorities in database and Web technologies, this book is essential reading for students and practitioners alike. The popularity of the Web and Internet commerce provides many extremely large datasets from which information can be gleaned by data mining. This book focuses on practical algorithms that have been used to solve key problems in data mining and can be applied successfully to even the largest datasets. It begins with a discussion of the MapReduce framework, an important tool for parallelizing algorithms automatically. The authors explain the tricks of locality-sensitive hashing and stream-processing algorithms for mining data that arrives too fast for exhaustive processing. Other chapters cover the PageRank idea and related tricks for organizing the Web, the problems of finding frequent itemsets, and clustering. This third edition includes new and extended coverage on decision trees, deep learning, and mining social-network graphs.
Call Number: 006.312 LES
Publication Date: 2020
The Intelligent Web by As we use the Web for social networking, shopping, and news, we leave a personal trail. These days, linger over a Web page selling lamps, and they will turn up at the advertising margins as you move around the Internet, reminding you, tempting you to make that purchase. Search engines such asGoogle can now look deep into the data on the Web to pull out instances of the words you are looking for. And there are pages that collect and assess information to give you a snapshot of changing political opinion. These are just basic examples of the growth of "Web intelligence", as increasinglysophisticated algorithms operate on the vast and growing amount of data on the Web, sifting, selecting, comparing, aggregating, correcting; following simple but powerful rules to decide what matters. While original optimism for Artificial Intelligence declined, this new kind of machine intelligenceis emerging as the Web grows ever larger and more interconnected. Gautam Shroff takes us on a journey through the computer science of search, natural language, text mining, machine learning, swarm computing, and semantic reasoning, from Watson to self-driving cars. This machine intelligence may even mimic at a basic level what happens in the brain.
Call Number: eBook
Publication Date: 2014