1.2 Prepare your data – Do’s and don’ts in data preparation for R


When preparing your data for use in R, there are several things you need to think of with regards to what R can understand and likes/doesn’t like in a data-file you want to import.

The data we will work with is organized in a spread-sheet. The top row should contain variable-names (metadata, environmental variables and/or response variables) as the column names/headers and each variable has it’s own column.

The site or sample number (sample ID) should be found in the first column as row name. In general, each row represents one observation. Sometimes, like in the example data, there is no special ID, but a combination of date, time, gear and depth clearly identifies each sample in each row. Unique sample ID’s can be created in R using this information.

 

Row and column names should be unique! R is case sensitive: “Name” and “name” are different.

There should not be any special characters (/ ! # & % ? > < { } ( ) and so on) in your spread-sheet, since those can be misinterpreted by R later on since they are used as part of the code. Only underscore “_” or dot “.” can be used.

You should not have empty spaces in your variable names or other cells. If you would like to have a space, use a “.” or “_” instead like e.g. Chl_small or Chl.big. Also avoid blank rows in your data sheet.

I like using longer column names in my spread sheets which tell me precisely what each column means and which can be altered i R later for plotting/tables. Others prefere to use abreviations as variable names from the start and add a sheet explaining the abreviations used to their excel-file. Choose what you like best.

Variable names need to start with a letter. Sample IDs can be numbers, number-letter combinations or letter combination.

If you have cells without observation, write “NA”.

If you use dates, use the long format for years: e.g. 01/01/2019 (not 01/01/19)

Do not have comments in your data sheet.

Export the spreadsheet as .csv (comma seperated value file) or as .txt (tab-delimited text file). If you use .csv, be aware of that if you use “,” to separate your decimals, you might run into problems.

  Fant du det du lette etter? Did you find this helpful?
[Average: 0]