IS2000 | Principles of Information Science
  • Home
  • Schedule
  • Policies
  • Labs
    • Lab - SQL, SQLite
    • Lab - R
    • Lab - Decision Analysis
  • Resources
    • Lecture Notes >
      • Information Science as a Discipline
      • Information Modeling, Ontologies, UML
      • Information Architecture
      • Information Theory
      • Essential Excel
      • Essential R
      • Information Representation in XML
      • Patterns XML & DTD
      • Information Retrieval with XPath
      • Information Transformation with XSL
      • Information Storage in RDB
      • Information Retrieval with SQL
      • Information Retrieval Specification Algebras
      • Information Processing in R
      • Information-Based Decision
      • Information Management & Policy
      • Information Security & Privacy
      • Information Transmission
      • Information Systems
      • Predictive Analytics
      • Process Analytics
      • Information Visualization
    • Text Books
    • Tools
    • Additional Readings
    • Archive
  • Help
    • Blog

Lecture Notes: Information Processing in R


The R Programming Language

The R programming language is one of the most commonly used languages for ad hoc data analysis in business when Excel is not sufficient or not able to handle large data sets.​ It is also often used to develop data mining or machine learning solutions, although production code is often re-written in Java or C++.

Programming with R

Environments:
  • R Console
  • R Studio with R Notebooks
  • R Studio Cloud
Variables & Identifiers:
  • Variables hold values: scalars, text strings, data frames, results from calls to functions
  • Variables have names that must follow certain rules: must start with a letter, can contain letters digits and certain special characters including dot (.) and underscore (_)
  • The dot (.) as a regular character can be confusing as other language (e.g., Java) use dot to designate property or method access 
  • Legal Variable Names:
    • df
    • df2
    • df.txns
    • df_all2017
  • Illegal Variable Names:
    • 2df           -- cannot start with a digit
    • rs$all        -- cannot contain a $; the $ is used to access columns in a dataframe
    • rs#           -- only . and _ are allowed in addition to digits and letters
Data & Object Types:
  • Variables are typed in R based on what value they contain:
  • The types can change simply by assigning a new value
  • Types can be converted using as.xxxx functions, e.g., as.numeric or as.date or as.string
  • Conversions are often necessary when data is read from CSV or XML files or databases
Variables, Objects, Assignment

    

Data Frames

Data frames are essentially spreadsheets; they are tables of rows and columns where each row is a data case and each column is a value.
Working with Data Frames

    

Loading Data from Files

Data Sets for code below:
  • ​Salaries.csv
Load Data from CSV into Dataframe

    

Queries on Data Frames

Queries on Data Frames

    
Dealing with Missing Data Values

    

Loading Data from Database into Data Frame with SQL

Data Sets & SQLite database for code below:
  • ​Salaries.csv | ProjectDB.db
Database Queries in R

    

Loading XML File into R

  • requires the library XML, so after installing the library from CRAN, use the command library(XML)
Processing XML in R via XPath

    
Processing XML in R via Tree

    
Processing XML in R via XPath

    
Picture

Learning

Blackboard
Lynda.com

Support

Contact Instructor
FAQ
Terms of Use
© COPYRIGHT 2016-20 by Martin Schedlbauer
​FREE FOR ACADEMIC USE WITH ACKNOWLEDGEMENT. 
ALL RIGHTS RESERVED.