The intricacies of research studies reflect the complex questions they aim to answer. Hierarchical relational databases, especially as implemented in MetaReviewer, bring a structured approach to capturing this complexity, ensuring that review teams spend more time interpreting their data and less time worrying about how to organize it. MetaReviewer’s interface makes it easy to create an elegant data extraction system and simple to produce spreadsheets that apply the principles of hierarchical relational databases.
What are Hierarchical Relational Databases?
Hierarchical relational databases (HRDs) are an approach to structuring complex, clustered data. HRDs are necessary whenever we have data that are nested beneath a higher-order unit, like when a study reports multiple measures of the same variable or multiple studies are published in the same paper. When data are structured in this way, we need some way of connecting or linking the multiple measures to the single study.
You may be familiar with the concept from primary research. Participants within an organization are said to be nested (or clustered) within that organization. Patients or doctors within a single medical facility; students or teachers within a single school; residents within a single building or block all represent examples of nested, primary data.
From a meta-analysis perspective, it may be illustrative to explain the HRD structure by explaining what is not an HRD structure. In the case of multiple measures within one study, a non-HRD approach using a simple, single spreadsheet is to represent each measure with its own row, repeating all other information in the remaining columns. A study with five different measures would thus be represented with five different rows, where the measure information changes for each row, but the other information remains constant.
HRDs attempt to reduce this repetition, creating an efficient and non-repetitive structure. In the case of multiple measures within a single study, a user will create a new row for each measure like the single spreadsheet style. But instead of copying the other information along with each row, a series of IDs represents the remaining information that is captured in other, connected parts of the spreadsheet. At the end of the process, HRDs use these IDs to link aspects of each study together.
Several key concepts about HRDs are worth describing because the HRD language requires differentiating between multiple sources and identifiers. Each spreadsheet is a ‘table’; each column is a ‘field’; each row is a ‘record’. A major difference is the concept of IDs – the Primary and Foreign IDs. The Primary ID is established on each table; it is the main ID that links all records back to the table. A Foreign ID is the use of the Primary ID in another table; it is what links together the first and second tables – the Foreign ID shows us how to link the two tables together.
Why is it important to use an HRD?
You may be wondering: ‘we just doubled or maybe even tripled our initial work, why would we go through the trouble of creating two additional spreadsheets and all of these various IDs?’ It’s a good question! And a common one among people we work with.
The answer lies in the traditional spreadsheet format. To illustrate this concept, let’s briefly return to a previous #MetaReviewerMonday Twitter/X Post about Taylor Swift discography (see the original posts here). In that example, we represented her discography in a traditional ‘flat’ spreadsheet and within an HRD. Now take a close inspection of the scrolling text in the Gif below, especially the first six columns on the left: the information remains static while the lyrics scroll by one after the other. It’s repetitive across all the lyrics because it is the exact same information: Lyric number 22 has the same Album and Song title as Lyric 33.
Imagine you are tasked with coding each lyric from each song and album. Perhaps you also want to code whether each lyric included a swear word, a drink reference, or a love statement. Using the traditional spreadsheet approach, your sheet would copy and paste the same 50 rows of information that represent the album and song! In addition to being repetitive, there’s a good chance you’ll make a mistake along the way.
An HRD captures each piece of information only once, within a table designed specifically for that information. There’s no repetitive information, no copying and pasting, no endless scrolling. Enter the information about an album or song once, then at the very end, combine each piece of information in your HRD back together to create one big spreadsheet for analysis.
How should synthesists think about HRDs?
The example using Taylor Swift’s discography is illustrative because you probably understand the connection between albums, songs, and lyrics. It’s a concept we naturally apply to all artists when organizing their work.
How should we – users and creators of systematic reviews and meta-analyses – think about HRDs? Let’s adapt an extant meta-analysis example. Suppose we are interested in the effect of language comprehension interventions on literacy, like Sliverman and colleagues (2020). In a future blog post, we will walk through the steps of creating a full codebook using our Google Document templates. (If you are curious right now, check out our video on the process.) To view the entire literacy database, check out the google sheet here.
We first create the ‘highest level’ of our database. In our literacy example, studies are the highest-level unit. The Silverman meta-analysis contained 42 unique studies, which we processed into 42 unique ID_Study values. We represent each with S01, S02, etc. We also capture the study authors (Authors) and the study’s date of publication (Date_of_Publication).
Next we create new tables to represent lower-level units that are nested within studies. For example, we will create a second table that represents the outcome measures, since some studies measured multiple outcomes. We also include a Foreign ID, in this case ID_Study, that links each outcome to its respective higher-level unit—the study.
Our final table represents the effect size, which is nested within outcomes since a single outcome construct (such as X) can be indexed using multiple measures (such as Z and Y). We create another Primary ID variable, in this case one that represents each effect size (ID_ES), and we include two Foreign IDs that link the effect size to its respective study (ID_Study) and the outcome measure (ID_Outcome). We also include the calculated effect size (ES) and its lower and upper confidence intervals (CI_Low and CI_Up, respectively). The first row says that the effect is from study S03 (Arthur & Davis, 2016) and measured outcome O01 (vocab, custom).
Finally, when we are ready to analyze the collected information, we use the IDs to create a consolidated, “flat” spreadsheet that links all the previous tables back together. MetaReviewer’s export functionality does the linking for you, so you don’t need to worry about how to do this.
Bringing is all together: Where is the HRD in MetaReviewer?
You might be thinking that this all sounds great but actually creating the HRD seems difficult. If you were not using MetaReviewer, you’d be correct.
Helping applied researchers who don’t have a data engineering degree (or any data science degree, for that matter) is exactly why we created MetaReviewer. We recognize the importance of HRDs for accurately, reliably, and efficiently collecting meta-analytic data. So we want to make the process of doing so not only possible, but painless!
MetaReviewer’s Google Document coding form templates and coding form importer do the hard work for you. In a future Learn blog post, we will show you how to modify a template for your project, import the template, and visualize exactly how MetaReviewer uses the principles of HRDs to take the pain out of the coding process.