Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools (Jeroen Janssens)

 
0.0 (0)
Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools (Jeroen Janssens)

This practical guide shows you how the command line's adaptability can make you a more effective and successful data scientist. You'll discover how to use simple yet effective command-line tools to acquire, clean, investigate, and model your data efficiently.

Author Jeroen Janssens offers a Docker image containing over 100 Unix power tools that can be used with Windows, macOS, or Linux to get you going.

It won't take you long to realize why the command line is a flexible, scalable, and adaptable tool. Even if you're confident using Python or R to handle data, you'll discover how to substantially enhance your data science workflow by taking advantage of the power of the command line.

This book is ideal for data scientists, analysts, engineers, system administrators, and researchers.

  • Obtain data from websites, APIs, databases, and spreadsheets
  • Perform scrub operations on plain text, CSV, HTML/XML, and JSON
  • Explore data, compute descriptive statistics, and create visualizations
  • Manage your data science workflow using Drake
  • Create reusable tools from one-liners and existing Python or R code
  • Parallelize and distribute data-intensive pipelines using GNU Parallel
  • Model data with dimensionality reduction, clustering, regression, and classification algorithms

Ebook Details

About the Authors
Data science is commonly taught by Jeroen Janssens through coaching and training, sometimes through speaking, and infrequently by publishing. Jeroen currently serves as the CEO of Data Science Seminars, which hosts meetups, inspiration sessions, hackathons, and open enrollment workshops.
Published
Published Date / Year
2nd edition (September 7, 2021); eBook (Creative Commons Licensed)
License(s)
CC BY-ND 4.0
Hardcover
282 pages
eBook Format
HTML
Language
English
ISBN-10
1492087912
ISBN-13
978-1492087915

Similar Programming & Computer Books

Guide avancé d'écriture des scripts Bash - Advanced Bash Scripting Guide
This lesson assumes no prior knowledge of scripting but enables quick advancement to an intermediate or advanced level of education while diving into UNIX ®-related details. ...
Strategic Foundations of General Equilibrium: Dynamic Matching and Bargaining Games (Douglas Gale)
Since Adam Smith's day, the theory of competition has played a significant role in economic study. This book, published by one of the most eminent modern economic theorists, details...
The Pure Logic Of Choice (Richard D. Fuerle)
A broad theory of economics based on free will is presented in this free programming book. The assumption that humans have free will and the ability to alter physical...
Portfolio Theory and Financial Analyses (Robert Alan Hill)
Whether they involve calculating the return on a portfolio, analyzing portfolio risk, or assessing the effectiveness of the portfolio management process, this free programming book links each of the...
Price Theory: An Intermediate Text (David D. Friedman)
In order to help the reader grasp the economic way of thinking, the author first gives verbal, intuitive explanations of the topics before using graphs and/or calculus to illustrate...
Mathematical Models in Portfolio Analysis (Farida Kachapova)
This free programming book presents the mathematical theory of portfolio modeling in financial mathematics as a coherent whole, with justifications for each step. ...
Handbook of Digital Face Manipulation and Detection: From DeepFakes to Morphing Attacks (Christian Rathgeb, et al)
The first thorough compilation of research on the popular subject of digital face alteration, including DeepFakes, Face Morphing, and Reenactment, is offered in this open access book. ...
NLP - Skills for Learning (Peter Freeth)
This free programming book explores how NLP (Neuro Linguistic Programming) is used in training, education, and instruction. It serves as both an introduction to NLP and a book about...
Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit (Steven Bird, et al)
The Natural Language Toolkit (NLTK) book is updated for Python 3 and NLTK 3 in this online edition from 2015.  
Data Mining in Medical and Biological Research (Eugenia G. Giannopoulou)
The goal of this free programming book is to compile the most recent developments and uses of data mining research from around the globe in the exciting fields of...

Others Programming Books by O'Reilly Media

Mastering Perl/Tk (Steve Lidie, et al)
Perl/Tk is a strong programming language that combines the Tk graphical toolkit with Perl, which is mostly used for system management, web development, and database processing. With Perl/Tk, you...
Java Security (Scott Oaks)
Java Security by Scott Oakes is exceptional in both its technical breadth and readability. It offers a thorough introduction of the Java security architecture and security classes, as well...
O'Reilly® Java AWT Reference (John Zukowski)
The Abstract Window Toolkit (AWT), a sizable collection of classes for creating graphical user interfaces in Java, is completely referenced in the Java AWT Reference. You can make windows,...
Free as in Freedom: Richard Stallman's Crusade for Free Software (Sam Williams)
freely as in Richard Stallman, the man behind the GNU project, is profiled in Freedom along with the political, social, and economic history of the free software movement. It...
Greasemonkey Hacks: Tips & Tools for Remixing the Web with Firefox (Mark Pilgrim)
For hardcore users who wish to learn Greasemonkey, the hottest new Firefox plugin that enables you to write scripts that modify the web pages you see, this book is...
Hacking Kubernetes: Threat-Driven Analysis and Defense (Andrew Martin, et al)
To operate your Kubernetes workloads in a secure and reliable manner, A threat-based overview of Kubernetes security is provided in this useful open-source book. ...
What is Dart? (Kathy Walrath, et al)
This free brief booklet introduces the Google Dart language, libraries, and development resources that support the creation of structured, quick, and maintainable web applications that work in any current...
Natural Language Processing with Python – Analyzing Text with the Natural Language Toolkit (Steven Bird, et al)
The Natural Language Toolkit (NLTK) book is updated for Python 3 and NLTK 3 in this online edition from 2015.  
Cascading Style Sheets: The Definitive Guide (Eric A. Meyer)
The Web Design CD Bookshelf, Version 1.0, includes this book. To put it simply, CSS is a method for separating a document's structure from its presentation. The...
Developing on AWS with C#: A Comprehensive Guide on Using C# to Build Solutions on the AWS Platform (Noah Gift, et al)
You are guided through the process of transitioning your monolithic application to microservices on AWS by this helpful book.  
Managing Projects with GNU Make: The Power of GNU make for Building Anything (Robert Mecklenburg)
One of the most enduring elements of both Unix and other operating systems is the utility known simply as make. Make, which was first developed in the 1970s, is...
Programming Embedded Systems in C and C++ (Michael Barr)
This free programming book's practical, no-nonsense approach will assist you in getting started by providing useful guidance from a person who has been in your position before and wants...
Planning for Big Data: A CIO's Handbook to the Changing Data Landscape (Edd Dumbill)
This free programming book offers a useful, approachable "brief" on the state of Big Data analytics today and how you may profitably use this technology to boost your company's...
Big Data Now: Current Perspectives from O'Reilly Radar (O'Reilly Radar Team)
This free programming book summarizes the report's findings on trends, techniques, applications, and predictions.  
Designing Event-Driven Systems: Concepts and Patterns for Streaming Services with Apache Kafka (Ben Stopford)
In Concepts and Patterns for Streaming Services with Apache Kafka, the author discusses how you may create mission-critical systems using service-based architectures and stream processing tools like Apache Kafka....
Visual Basic 2005: A Developer's Notebook (Matthew MacDonald)
The optimum test track is provided in this free programming book. This practical introduction to VB 2005 will get you up to speed on all the new features of...
Ajax Design Patterns (Michael Mahemoff)
You will learn best practices in this free programming book that will significantly enhance your web development initiatives. It looks at how others have resolved conflicts between design principles...
The Java Reference Library CD Bookshelf, 5 Bestselling Books (O'Reilly & Associates)
A Java programmer's fantasy has come true with the Java Reference Library CD Bookshelf.  
The Java Enterprise CD Bookshelf, 7 Bestselling Books on CD-ROM (O'Reilly & Associates)
Any Web browser can read The Java Enterprise CD Bookshelf because it is formatted in HTML. The books are fully cross-referenced and searchable. ...
Docker for Java Developers (Arun Gupta)
This free programming book demonstrates how to speed up the starting and deployment of your Java-based apps while introducing fundamental Docker concepts. You'll discover how Docker containers increase machine...

User reviews

There are no user reviews for this listing.
Ratings
Rate this Book
Comments