3 Accessing and understanding detections data
Before downloading your detection data, please ensure that you have no pending metadata issues through the online Data Issues page
This chapter will begin with an introduction to the structure of the detections database, followed by instructions on how to download and access the data. At the end, a summary section that includes a script to download, select variables, clean data, and export is provided (see 3.13)
3.1 Data structure
Each tag detection database is stored as an SQLite (‘dplyr::src_sqlite’) file with the extension ‘.motus’. The sqlite format was chosen because:
- it is flexible, allowing for many data formats.
- it is accessible from many software platforms (not just R).
- it is appendable: the database can be created and updated on disk without having to read in and resave the entire contents. This will save time and computer memory when searching to see if any new detections are available for your project or receiver.
The .motus file contains a series of interrelated tables where data are stored in a condensed format to save memory. The following tables are included in your .motus file;
- antDeps: metadata related to antenna deployments, e.g., deployment height, angle, antenna type.
- batchRuns: metadata for runIDs and associated batchIDs
- batches: detection data for a given receiver and boot number.
- filters: metadata related to user created filters associated with the specified receiver.
- gps: metadata related to Geographic Positioning System (GPS) position of receiver.
- hits: detection data at the level of individual hits.
- meta: metadata related to the project and datatype (tags vs. receivers) that are included in the .motus file
- projAmbig: metadata related to what projects have ambiguous tag detections
- projBatch: metadata for the number of detections contained in each batch
- projs: metadata related to projects, e.g., project name, principal investigator.
- recvDeps: metadata related to receiver deployments, e.g., deployment date, location, receiver characteristics.
- recvs: metadata related to receiver serial number and associated Motus deviceID
- runs: detection data associated with a run (continuous detections of a unique tag on a given receiver).
- runsFilters: a list of runIDs associated with user created filters and assigned probabilities.
- species: metadata related to species, e.g., unique identifier, scientific name, common name.
- tagAmbig: metadata related to ambiguous tags, e.g., ambigID and associated motusTagID
- tagDeps: metadata related to tag deployments, e.g., deployment date, location, and species.
- tags: metadata related to tags, e.g., unique identifier, tag characteristics (e.g., burst interval).
In addition to these tables, there are also ‘virtual’ tables or ‘views’, which have been created through queries that merge data from the various tables into a single convenient ‘view’ that contains all of the fields you are likely to need. The following views are currently included in each .motus file:
- allambigs: lists in long-data format each motusTagID (up to 6) associated with each negative ambigID.
- alltags: provides the full detection data for all tags, and all ambiguous (duplicate) tags, associated with your project. Ambiguous detections are repeated for each motusTagID represented by each ambigID.
Because the file is a dplyr::src_sqlite file, all of the dplyr functions can be used to filter and summarize the .motus database, without needing to first save the data as a flat file (a typical two-dimensional dataframe). The SQL format is very advantageous when you have a large file – the queries using SQL will be substantially faster than those done on a flat dataframe.
3.2 Database types
There are two types of tag detection databases available for download:
receiver database: includes all detections of any registered tags from a single receiver. A receiver database has a name like SG-1234BBBK5678.motus, where the name is the serial number of the receiver.
project database: includes all detections of your registered tags from across the Motus network. A tag project database has a name like project-123.motus, where the number is the Motus project ID.
These two databases correspond to the basic model of data sharing:
you get all detections of anyone’s tags by your receivers (i.e., one receiver tag database for each receiver you deploy).
you get all detections of your tags by anyone’s receivers (i.e., one project tag database for each of your Motus projects).
3.3 Load relevant R packages
Before we begin working with data, we need to load the required packages for this chapter. If you have not yet installed these packages (from github and CRAN) then please return to Chapter 2 and do so.
# load required 'motus' package from github require(motus)
3.4 Set system environment
Set the system environment time zone to Greenwich Mean Time (GMT), to ensure that you are always working in GMT. This is a very important step, and should be part of every working session. If you fail to do this, then two problems can arise. Times are stored in the Motus database in GMT, and if you do not keep your environment in GMT, then they can be inadvertently changed during import. Second, if tags have been detected across multiple time zones, then they can also inadvertently be changed.
Sys.setenv(TZ = "GMT")
3.5 Downloading tag detections
To import tag detections for your project or receiver, you need a numerical project id or character scalar receiver serial number.
The success of the Motus network is dependent on the timely upload of detection data from receivers, and on the maintenance of accurate and up to date tag and receiver metadata by collaborators. After downloading your data from the Motus server, users are encouraged to check for updated detection data (see sections 3.7 and 3.8) and metadata (see section 3.9) each time they run an analysis, because collaborators can add detection data and metadata at any time, and these could influence the completeness of your own detections data.
Be warned that large datasets can take some time (sometimes a few hours) to download from the Motus server when downloading for the first time. After the initial download, loading a .motus file into R and updating for any new data will be near instantaneous.
3.5.1 Download data for a project for the first time
All data downloads are completed using the tagme() function in the motus R package. This function will save an SQLite database to your computer with the extension “.motus”; further details on data structure are in section 3.1. The following parameters are available for the tagme() function:
- projRecv: integer project number OR a character scalar receiver serial number.
- new: if set to “TRUE”, it will create a new empty .motus file in your local directory. Do not use this parameter or set it to “FALSE” if you already have a .motus file.
- update: if set to “TRUE”, will download all available data to your existing .motus file. Must be set to “TRUE” on your first data download and any subsequent downloads if you wish to check for new data. Set to “FALSE” if you do not wish to check for new data (e.g., if working offline).
- dir: Your .motus data is automatically saved to your working directory, unless you specify a different location using this parameter.
- forceMeta: if set to “TRUE”, it will force an update of metadata to an existing .motus file.
Throughout this book we use sample data (see section 1.3) which has been assigned to project 176.
Let’s get started by downloading data by project - this will include all detections of your tags on any receiver.
Note that when downloading data from the Motus server for the first time, you must specify ‘new = TRUE’ and ‘update = TRUE’. You will also be prompted to login (see 3.5.2 below).
Unless the directory that you want your data saved in is stated explicitly within the function call, data will be downloaded to the current working directory.
# lets start by determining what our working # directory is so we know where our file will be # saved you can change the working directory with # setwd() getwd() # specify the project number you wish to download # detections for, in this case the sample data # project; proj.num <- 176 # As this is the first time you are downloading # data for project 176, set new = TRUE and update = # TRUE. This will create a .motus file in your # current working directory, which was shown above # using getwd() This will also create an SQL obect # in your R environment called 'sql.motus' sql.motus <- tagme(projRecv = proj.num, new = TRUE, update = TRUE) # OR you can specify a different location to save # the data by entering your preferred filepath. In # this example we save to our data folder using the # dir parameter: sql.motus <- tagme(projRecv = proj.num, new = TRUE, update = TRUE, dir = "C:/Users/guest/Documents/data/")
Using tagme() as shown above will download a file to your working or specified directory called “project-176.motus” for the sample data (the number in the file name corresponds to the project number). The progress of the download process should print on the console; if you are not seeing it, try scrolling down your screen while tagme is running.
In the event that your connection to the Motus server fails prior to a complete download (e.g., due to a poor internet connection), use ‘tagme(proj.num, update = TRUE)’ to continue the download from where it left off, ensuring to specify a directory if it is saved outside the working directory.
3.5.2 User Authentication
The first time you call a function using the Motus R package, you will be asked to enter your motus.org username and password in the R console to authenticate your access to project data. This will only happen once per R session. If you do not have a Motus username and password, you can sign up at https://motus.org/data/user/new. Permission to access project data will then be granted by Motus staff or the project principal investigator.
Throughout this book we use sample data (see section 1.3) which has been assigned to project 176. When accessing this data you will need to login using username and password ‘motus.sample’ in the R console when prompted by the tagme() function (see section 3.5.1). It will look like this:
To download data for one of your own projects, you simply need to change the project number to that of your own project in the tagme() call, and enter your own Motus login/password in the R console when prompted. If you are already logged in as the sample data user, you will need to first logout to download your own data (see 220.127.116.11).
18.104.22.168 Logging out
Once you are logged in under one user account, you will not be able to access data from another account. If you need to logout of the current account to access other data, you can run the code below.
3.5.3 Download data for a receiver for the first time
We could also download data by receiver through the same process as described above. This will provide you with all detections of any tags on the specified receiver. As there are no receivers registered to sample project 176, this call will not work. If you have a receiver registered to your own project, replace the receiver serial number in the tagme call below with the serial number for your own receiver, ensuring that you are logged in using your own credentials (see section 22.214.171.124).
proj.num = "SG-123BBBK1234" sql.motus <- tagme(projRecv = proj.num, new = TRUE, update = TRUE)
This will download a file to your working directory named ‘SG-123BBBK1234.motus’.
Users may wish to work directly with the .motus SQLite file. However, since many users are more familiar with a ‘flat’ dataframe format, instructions to view the the data as a flat dataframe within R, and on how to export the flat file to .csv or .rds format, are included below. Throughout the majority of this book, we use a flat dataframe format.
3.5.4 Accessing downloaded detection data
Now that we’ve downloaded our data as an SQLite database and loaded it into an R object called ‘sql.motus’, we want to access the tables stored within. Detailed descriptions of all the tables stored in the .motus file can be found in section 3.1.
You can also view the list of tables, and variables contained within those tables, using the code below:
# specify the filepath where your .motus file is # saved, and the file name. file.name <- dbConnect(SQLite(), "./data/project-176.motus") # get a list of tables in the .motus file specified # above. dbListTables(file.name) # get a list of variables in the 'species' table in # the .motus file. dbListFields(file.name, "species")
The virtual table ‘alltags’ contains the detection data, along with all metadata variables that most users will ever need from the various underlying .motus tables. We access the tables using the tbl() function from the dplyr package which we installed in Chapter 2:
# this retrieves the 'alltags' table from the # 'sql.motus' SQLite file we read in earlier tbl.alltags <- tbl(sql.motus, "alltags") # virtual table
We now have a new ‘tbl.alltags’ object in R. The underlying structure of these tables is a list of length 2:
The first part of the list, ‘src’, is a list that provides details of the SQLiteConnection, including the directory where the database is stored. The second part is a list that includes the underlying table. Thus, the R object ‘alltags’ is a virtual table that stores the database structure and information required to connect to the underlying data in the .motus file. As stated above, the advantage of storing the data in this way is that it saves memory when accessing very large databases, and functions within the dplyr package can be used to manipulate and summarize the tables before collecting the results into a typical ‘flat’ format dataframe.
If you want to use familiar functions to get access to components of the underlying data frame, then use the ‘collect’ function. For example, to look at the names of the variables in the alltags table:
tbl.alltags %>% collect() %>% names() # list the variable names in the table
##  "hitID" "runID" "batchID" ##  "ts" "sig" "sigsd" ##  "noise" "freq" "freqsd" ##  "slop" "burstSlop" "done" ##  "motusTagID" "ambigID" "port" ##  "runLen" "bootnum" "tagProjID" ##  "mfgID" "tagType" "codeSet" ##  "mfg" "tagModel" "tagLifespan" ##  "nomFreq" "tagBI" "pulseLen" ##  "tagDeployID" "speciesID" "markerNumber" ##  "markerType" "tagDeployStart" "tagDeployEnd" ##  "tagDeployLat" "tagDeployLon" "tagDeployAlt" ##  "tagDeployComments" "fullID" "deviceID" ##  "recvDeployID" "recvDeployLat" "recvDeployLon" ##  "recvDeployAlt" "recv" "recvDeployName" ##  "recvSiteName" "isRecvMobile" "recvProjID" ##  "antType" "antBearing" "antHeight" ##  "speciesEN" "speciesFR" "speciesSci" ##  "speciesGroup" "tagProjName" "recvProjName" ##  "gpsLat" "gpsLon" "gpsAlt"
To convert the ‘alltags’ view or other table in the .motus file into a typical ‘flat’ format, i.e., with every record for each field filled in, use the ‘collect()’ and ‘as.data.frame()’ functions. The output can then be further manipulated, or used to generate a RDS file of your data for archiving or export.
df.alltags <- tbl.alltags %>% collect() %>% as.data.frame()
Now we have a flat dataframe of the alltags table called ‘df.alltags’. We can look at some metrics of the file:
names(df.alltags) # field names str(df.alltags) # structure of your data fields head(df.alltags) # prints the first 6 rows of your df to the console summary(df.alltags) # summary of each column in your df
Note that the format of the time stamp (ts) field is numeric and represents seconds since January 1 1970. We recommend that when you transform your tables into flat dataframes, that you format the time stamp using the lubridate package at that time, e.g.:
df.alltags <- tbl.alltags %>% collect() %>% as.data.frame() %>% # for all fields in the df (data frame) mutate(ts = as_datetime(ts, tz = "UTC", origin = "1970-01-01")) # the tz = "UTC" is not necessary here, provided you have set your system time to UTC/GMT # ... but it serves as a useful reminder!
Note that time stamps can only be manipulated in this way after collecting the data into a flat dataframe.
If you want to load only part of your entire virtual table (e.g. certain fields, certain tags, or all tags from a specified project or species), you can use dplyr functions to filter the data before collecting into a dataframe. Some examples are below:
- To select certain variables:
# to grab a subset of variables, in this case a # unique list of Motus tag IDs at each receiver and # antenna. df.alltagsSub <- select(tbl.alltags, recv, port, motusTagID) %>% distinct() %>% collect() %>% as.data.frame()
- To select certain tag IDs:
# filter to include only motusTagIDs 16011, 23316 df.alltagsSub <- filter(tbl.alltags, motusTagID %in% c(16011, 23316)) %>% collect() %>% as.data.frame() %>% mutate(ts = as_datetime(ts, tz = "UTC", origin = "1970-01-01"))
- To select a specified species:
# filter to only Red Knot (using speciesID) df.4670 <- filter(tbl.alltags, speciesID == 4670) %>% collect() %>% as.data.frame() %>% mutate(ts = as_datetime(ts, tz = "UTC", origin = "1970-01-01")) # filter to only Red Knot (using English name) df.redKnot <- filter(tbl.alltags, speciesEN == "Red Knot") %>% collect() %>% as.data.frame() %>% mutate(ts = as_datetime(ts, tz = "UTC", origin = "1970-01-01"))
Using dplyr(), your virtual table can also be summarized before converting to a flat file. For example, to find the number of different detections for each tag at each receiver:
df.detectSum <- tbl.alltags %>% group_by(motusTagID, recv) %>% tally() %>% collect() %>% as.data.frame()
In later chapter(s) we will show you additional ways of summarizing and working with your data.
3.6 Export your ‘flat’ dataframe to CSV or RDS file
A good workflow is to create a script that deals with all your data issues (as described in later chapters), and then saves the resulting dataframe for re-use. If you do this, you can quickly start an analysis or visualization session from a known (and consistent) starting point. We use an .rds file, which preserves all of the associated R data structures (such as time stamps).
# save the df.alltags file as an RDS file in your # data folder saveRDS(df.alltags, "./data/df.alltags.RDS")
Some users may also want to export the flat dataframe into a .csv file for analysis in other programs. This can easily be done with the following code. Note that it does not preserve time stamps:
# save the df.alltags file as a .csv file in your # data folder write.csv(df.alltags, "./data/df.alltags.CSV")
3.7 Update and/or open an existing database
As you or other users upload data to our server, you may have additional tag detections that weren’t present in your initial data download. Since the .motus file is a SQLite database, you can update your existing file with any newly available data, rather than doing a complete new download of the entire database. To open and update a detections database that already exists (has been downloaded previously), we use the tagme() function but set new = FALSE:
sql.motus <- tagme(projRecv = proj.num, new = FALSE, update = TRUE, dir = "./data/") # use dir = to specify a directory
If you are working offline, and simply want to open an already downloaded database without connecting to the server to update, use new = FALSE and update = FALSE:
sql.motus <- tagme(projRecv = proj.num, new = FALSE, update = FALSE) # use dir = to specify a directory
3.8 Check if new data are available
To check if new data are available for your project or receiver without downloading the data, you can use the tellme() function, which returns a list with:
- numHits: number of new tag detections.
- numBytes: approximate uncompressed size of data transfer required, in megabytes.
- numRuns: number of runs of new tag detections, where a run is a series of continuous detections for a tag on a given antenna.
- numBatches: number of batches of new data.
- numGPS: number of GPS records of new data.
The following assumes that a local copy of the database already exists:
tellme(projRecv = proj.num) # If db is in the working directory tellme(projRecv = proj.num, dir = "./data/") # To specify a different directory
To check how much data is available for a project but you do not have a database for it, use the ‘new’ parameter:
tellme(projRecv = proj.num, new = TRUE)
3.9 Force an update/re-import of tag and receiver deployment metadata
Tag and receiver metadata are automatically merged with tag detections when data are downloaded. However, if metadata have been updated since your initial download, you can force re-import of the metadata when updating a database by running:
sql.motus <- tagme(projRecv = proj.num, forceMeta = TRUE)
3.10 Import full tag and receiver metadata
When you use tagme() to download or update your .motus file, you are provided with the metadata for:
- any tags registered to your project which have detections;
- tags from other projects which are associated with ambiguous detections (see Chapter 5) in your data;
- receivers that your tags and any ambiguous tags were detected on.
In many instances, you will want access to the full metadata for all tags and receivers across the network, e.g., to determine how many of your deployed tags were not detected, or to plot the location of stations with and without detections. The metadata() function can be used to add the complete Motus metadata to your .motus file. The metadata function only needs to be run once, but we suggest that you re-import the metadata occasionally to ensure that you have the most recent and up-to-date information.
Running the metadata function as follows will add the appropriate metadata from across the network (all tags and all receivers) to the ‘recvDeps’ and ‘tagDeps’ tables in your .motus file:
# access all tag and receiver metadata for all # projects in the network. metadata(sql.motus)
Alternatively, you can load metadata for a specific project(s) using:
# access tag and receiver metadata associated with # project 176 metadata(sql.motus, projectIDs = 176) # access tag and receiver metadata associated with # projects 176 and 1 metadata(sql.motus, projectIDs = c(176, 1))
3.11 Ensure that you have the correct database version
When you call the tagme function to load the sqlite database, the version of the R package used to download the data is stored in an admInfo table. Over time, changes will be made to the functionality of the R package that may require adding new tables, views or fields to the database. If your version of the database does not match the version of the R package, some of the examples contained in this book may not work. The following call will check that your database has been updated to the version matching the current version of the motus R package. If your database does not match the most current version of the R package, see Chapter 2 for instructions on updating the motus and motusClient R packages, and then use tagme() with update = TRUE to update your database to the correct format. Refer to Appendix B if the checkVersion() call returns a warning.
3.12 R object naming convention
Throughout this chapter and the rest of the book, we name R objects according to their structure and the source of the data contained in the object. So, SQLite objects will be prefixed with ‘sql.’, virtual table objects will be prefixed with ‘tbl.’, and dataframe objects will be prefixed with ‘df.’; the rest of the name will include the name of the .motus table that the data originates from. Throughout the rest of the book we will be relying on and referencing the naming formats below; please ensure that you are familiar with these before continuing to the next chapter. The following code assumes you have already downloaded the sample data and do not need to update it; if you have not downloaded the data, see section 3.5.1 for instructions on initial download:
# SQLite R object, which links to the .motus file: sql.motus <- tagme(176, update = TRUE, dir = "./data") # virtual table object of the alltags table in the sample.motus file: tbl.alltags <- tbl(sql.motus, "alltags") df.alltags <- tbl.alltags %>% collect() %>% as.data.frame() %>% # dataframe ("flat") object of alltags table mutate(ts = as_datetime(ts, tz = "UTC", origin = "1970-01-01"))
3.13 Summary: download, access, and export data
Throughout the book we will predominantly be working with the ‘alltags’ table in flat format. Although described in detail above, here we include a quick summary of how you download, access, and convert the sample data for use in the rest of the book. In later sections of the book, we include additional recommended modifications and filtering of unnecessary variables, more information is available in section 5.1 in Chapter 5.
# set proj.num to 176 for sample data proj.num <- 176 # download and load detection data from the sample # project and save in data folder login and # password for sample data is 'motus.sample' if you # are accessing already-downloaded data, use new = # FALSE; if you don't want to update your data, # also set update = FALSE sql.motus <- tagme(proj.num, new = TRUE, update = TRUE, dir = "./data/") # access the 'alltags' table within the SQLite file tbl.alltags <- tbl(sql.motus, "alltags") # convert 'tbl.alltags' to a flat dataframe and # change numeric time to a datetime object df.alltags <- tbl.alltags %>% collect() %>% as.data.frame() %>% mutate(ts = as_datetime(ts, tz = "UTC", origin = "1970-01-01"))
For your own data we suggest creating a script with the following workflow:
- download and/or update your data
- select variables of interest for the table you are working with (typically ‘alltags’)
- include any initial cleaning
- save the resulting data as an .rds file as described in section 3.6. We suggest using RDS instead of CSV, because the RDS format preserves the underlying structure of the data (e.g. POSIX times stay as POSIX times). If you want to export your data to another program, then a CSV format might be preferred.
We caution that producing a flat file using the full suite of fields can use a lot of memory, and can slow R down considerably when dealing with large datasets. For some combinations of data sets and computers, it may be impossible to directly use data frames in R. If that is the case, then this is the point in your workflow where you should carefully consider the information you need from within your data set (for example, how it is aggregated) and simplify it. You can always return to this script and creating a new RDS file with different variables, or aggregated at a different scale.
To read in a saved RDS file you can run:
df.alltags.saved <- readRDS("./data/df.alltags.rds") # reads in your file 'df.alltags.rds' saved in the data folder
In the next chapter we will check for missing metadata.