IS2000 | Principles of Information Science
  • Home
  • Schedule
  • Labs
    • Lab - SQL, SQLite
    • Lab - R
    • Lab - Decision Analysis
  • Assignments
    • Assignment 1
    • Assignment 2
    • Assignment 3
    • Assignment 4
    • Assignment 5
    • Portfolios & Blogs
  • Policies
    • Attendance
    • Collaboration
    • Exam Rules
    • Assessments
    • Grading
    • Academic Integrity
  • Resources
    • Lecture Notes >
      • Information Science as a Discipline
      • Information Architecture
      • Information Theory
      • Information Collection
      • Essential Excel
      • Essential R
      • Information Modeling
      • Information Representation in XML
      • Information Retrieval with XPath
      • Information Transformation with XSL
      • Information Storage in RDB
      • Information Retrieval with SQL
      • Information Processing in R
      • Information-Based Decision
      • Information Management & Policy
      • Information Security & Privacy
      • Information Transmission
      • Information Systems
      • Predictive Analytics
      • Process Analytics
      • Information Visualization
    • Text Books
    • Tools
    • Additional Readings
    • Archive
  • Help
    • Blog
    • Class Diary
    • Office Hours
    • Instructor

Lecture Notes: Information Processing in R


The R Programming Language

The R programming language is one of the most commonly used languages for ad hoc data analysis in business when Excel is not sufficient or not able to handle large data sets.​

Programming with R

Environments:
  • R Console
  • R Studio with R Notebooks
Variables & Identifiers:
  • Variables hold values: scalars, text strings, data frames, results from calls to functions
  • Variables have names that must follow certain rules: must start with a letter, can contain letters digits and certain special characters including dot (.) and underscore (_).
  • Legal Variable Names:
    • df
    • df2
    • df.txns
    • df_all2017
  • Illegal Variable Names:
    • 2df           -- cannot start with a digit
    • rs$all        -- cannot contain a $; the $ is used to access columns in a dataframe
    • rs#           -- only . and _ are allowed in addition to digits and letters
Data & Object Types:
  • Variables are typed in R based on what value they contain:
  • The types can change simply by assigning a new value
  • Types can be converted using as.xxxx functions, e.g., as.numeric or as.date or as.string
  • Conversions are often necessary when data is read from CSV or XML files or databases
Variables, Objects, Assignment

    

Data Frames

Data frames are essentially spreadsheets; they are tables of rows and columns where each row is a data case and each column is a value.
Working with Data Frames

    

Loading Data from Files

Data Sets for code below:
  • ​Salaries.csv
Load Data from CSV into Dataframe

    

Queries on Data Frames

Queries on Data Frames

    
Dealing with Missing Data Values

    

Loading Data from Database into Data Frame with SQL

Data Sets & SQLite database for code below:
  • ​Salaries.csv | ProjectDB.db
Database Queries in R

    

Loading XML File into R

  • requires the library XML, so after installing the library from CRAN, use the command library(XML)
Processing XML in R via XPath

    
Processing XML in R via Tree

    
Processing XML in R via XPath

    
Picture

Learning

Blackboard
Lynda.com

Support

Contact Instructor
FAQ
Terms of Use
© COPYRIGHT 2016-19 by Martin Schedlbauer
​FREE FOR ACADEMIC USE WITH ACKNOWLEDGEMENT AND NOTICE. 
ALL RIGHTS RESERVED.