--- title: "Getting Started" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{getting-started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, include = FALSE} library(loggit) ``` `loggit` is an easy-to-use, yet powerful, [`ndjson`](https://github.com/ndjson) logger. It is very fast, has zero external dependencies, and can be as straightforward or as integral as you want to make it. R has a selection of built-in functions for handling different *exceptions*, or special cases where diagnostic messages are provided, and/or function execution is halted because of an error. However, R itself provides nothing to record this diagnostic post-hoc; useRs are left with what is printed to the console as their only means of analyzing the what-went-wrong of their code. There are some slightly hacky ways of capturing this console output, such as `sink`ing to a text file, repetitively `cat`ing identical exception messages that are passed to existing handler calls, etc. But there are two main issues with these approaches: 1. The console output is not at all easy to parse, so that a user can quickly identify the causes of failure without manually scanning through it 2. Even if the user tries to structure a text file output, they would likely have to ensure consistency in that output across all their work, and there is still the issue of parsing that text file into a familiar, usable format Enter: [JSON](https://www.json.org/) For those unaware: JSON is a lightweight, portable (standardized) data format that is easy to read and write by both humans and machines. An excerpt from the introduction of the JSON link above: >JSON (JavaScript Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999. JSON is a text format that is completely language independent but uses conventions that are familiar to programmers of the C-family of languages, including C, C++, C\#, Java, JavaScript, Perl, Python, and many others. These properties make JSON an ideal data-interchange language. Basically, you can think of JSON objects like you would think of `list`s in R: a set of named key-value pairs. Since R `list`s are subsets of the `data.frame` class, logs written by `loggit` are easily retrievable as data frames -- this means you can analyze your log data, *right from your R code!* What `loggit` does a bit differently is write logs as *newline-delimited JSON* (`ndsjon`). Instead of a JSON file that looks like this: ``` [ { "key1": "value1" }, { "key2": "value2" } ] ``` `loggit`'s logs containing the same data will instead put each object on its own line: ``` {"key1": "value1"} {"key2": "value2"} ``` This makes the log entries themselves exhibit very fast disk write speeds, while still being machine-parsable, human-readable, and ideal for log stream collection systems (like the `stdout` of your terminal, or a container in Docker or Kubernetes). How to Use `loggit` ------------------- To write a log entry using `loggit` via its exception handlers, you just load `loggit`, set its log file location, and use the same handlers you always do: ```{r handlers_0, eval = FALSE} library(loggit) set_logfile("/path/to/my/log/directory/loggit.log") # loggit enforces no specific file extension ``` ```{r handlers} message("This is a message") warning("This is a warning") # stop("This is a critical error, so I'm not actually going to run it in this vignette") ``` You can see that the handlers will pring both the `loggit`-generated log entry, as well as their base default output. To only have the JSON print, wrap the call in the appropriate suppressor (i.e. `suppressMessages()` or `suppressWarnings()`). To only have the base text printed, pass `echo = FALSE` to the handler. And... that's it! You've introduced human-readable, machine-parsable logging into your workflow! However, surely you want more control over your logs. Behind the scenes, `loggit`'s core function, also called `loggit()`, is executed right before the base handlers with some sane defaults. However, the `loggit()` function is also exported for use by the developer: ```{r loggit_func} loggit("INFO", "This is also a message") loggit("WARN", "This is also a warning") loggit("ERROR", "This is an error, but it won't stop your code from running like `stop()` does") ``` *"But why wouldn't I just use the handlers instead?"* Because `loggit()` exposes much greater flexibility to the user, by way of *custom fields*. ```{r custom_fields} loggit( "INFO", "This is a message", but_maybe = "you want more fields?", sure = "why not?", like = 2, or = 10, what = "ever" ) ``` Since JSON is considered *semi-structured data* (sometimes called "schema-on-read"), you can log any custom fields you like, as *inconsistently* as you like. It all just ends up as text in a file, with no column structure to worry about. So, `loggit`'s log format is a special type of JSON. JSON objects are like `list`s -- and so are `data.frames`. To allow for the most flexibility, the `read_logs()` function is available to you, which reads in the currently-set log file as a data frame: ```{r read_logs} read_logs() ``` Notice that `read_logs()` handles any columnar inconsistencies as mentioned above. If `read_logs()` finds a field that other entries don't have, it maps it to an empty string for that log entry. This was chosen over `NA`s to allow for consistency on re-write. You can, however, just replace all the empty strings with `NA` after read, if you want to. You can also pass a file path to `read_logs()`, and read that `loggit` log file instead. The other helpful utilities are as follows: - You can control the format of the timestamp in the logs; it defaults to ISO format `"%Y-%m-%dT%H:%M:%S%z"`, but you may set it yourself using `set_timestamp_format()`. Note that this format is ultimately passed to `format.Date()`, so the supplied format needs to be valid. - You can control the output name & location of the log file using `set_logfile(logfile)`. Similarly, you can retrieve the location of the current log file using `get_logfile()`. Things to keep in mind ---------------------- - `loggit` will default to writing to an R temporary directory. As per CRAN policies, ***a package cannot write*** to a user's "home filespace" without approval. Therefore, you need to set the log file before any logs are written to disk, using `set_logfile(logfile)` (I recommend in your working directory, and naming it "loggit.log"). If you are using loggit in your own package, you can wrap this in a call to `.onLoad()`, so that logging is set on package load. If not, then make the set call as soon as possible (e.g. at the top of your script(s), right after your calls to `library()`); otherwise, no logs will be written to persistent storage!