Skip to content

3. Ethical issues in corpus building

Shelley Staples edited this page Nov 30, 2021 · 27 revisions

Contents

Getting started

Some corpora are built from publicly available data, such as writing posted on the web. Building corpora from public data usually does not require approval from an ethics board, though researchers should always consider the rights and welfare of the people whose writing or speech goes into a corpus.

If you are building a corpus from data such as workplace writing or student work, you’ll need to get approval from your institution’s IRB or ethics board. Our experience suggests planning and cooperation can make this work much easier.

Seven steps

The Crow team suggests seven best practices for ethical corpus building:

  1. Determine the type of permissions you will need.
  2. Work closely with your IRB or ethics board.
  3. Build partnerships with writing program administrators.
  4. Collaborate with and support instructors.
  5. Develop a plan for gathering demographic data.
  6. Clarify the scope of sharing your corpus.
  7. Establish processes for secure data storage.

Below, we offer more detail about each of these, in turn.

Determine the type of permissions you will need.

Permission from participants is a key part of ethical writing research. Many institutions allow researchers to obtain permission in one of two ways:

Opt-out: You can collect data directly, such as by collaborating with instructors to get access to student work, then notify participants their writing can be excluded from your corpus if they wish.

Opt-in: Participants must actively agree their writing can be used, by completing a consent form or online agreement. In most cases, opt-in approaches include direct recruitment of participants.

Your approach to securing permission should take your research questions into consideration. Both approaches require careful record keeping — recording who opts out, or storing consent forms or online agreements.

Opt-in approaches can reduce participation rates and are more labor-intensive since each participant must be contacted directly. Plan accordingly.

Work closely with your IRB or ethics board.

Most American universities have an Institutional Review Board (IRB) that supervises human subjects research, though teaching-intensive institutions may rely on others to perform this work.

Imagine your Institutional Review Board (IRB) or ethics board as partners who can help us be ethical researchers. Avoid thinking of them as enforcers of regulations.

IRB staffers are often eager to collaborate directly with researchers. If IRB analysts have walk-in hours or will meet with you to discuss your study, definitely take advantage of that opportunity. Discussing your questions can save lots of time filling out required forms and/or revising your answers. You should also read any tutorials, quick start guides, examples, or other documentation your IRB provides.

Build partnerships with writing program administrators.

Work closely with your local writing program administrators (WPAs) or other stakeholders. Without their cooperation, getting access to student work will be a challenge. Indeed, university, college, or department policies may require that WPAs review research projects that turn to their students as participants. This is the case at several Crow institutions: for example, the University of Arizona Writing Program requires that all investigators submit their IRB-approved protocol, as well as a short narrative summary of their research, before gathering data from Writing Program classes.

Again, think of WPAs as partners: if you collaborate with them, and offer to share findings or support their work, working with administrators can make gathering data much easier.

We suggest reaching out to program administrators early in the process and not only asking for their permission but their feedback about your research. Contact your WPAs early in the corpus building process, and keep them in the loop. Offer to share your findings and seek ways to give back to the writing program to recognize their help.

Collaborate with and support instructors.

As with your IRB and local administrators, communicating with instructors, and seeing them as partners, is absolutely essential for successful corpus building.

Instructors can provide access to student writing by granting access to their classes to recruit students, or by sharing assignments students submit through course management systems. Either way, develop relationships with instructors and help them see the value of your research.

Make sure you’re ready to answer instructors’ questions about your research goals and the ways you will use the student writing they help you collect. Consider offering incentives for participating instructors, or finding other ways to give back to them by helping them explore corpus-supported instruction.

Develop a plan for gathering demographic data.

Adding demographic information such as first language or major can make your corpus more useful by allowing users to perform targeted searches. You can see this in the Crow interface.

Screenshot of Crow interface showing metadata at right

Demographic data can be collected directly from participants (e.g. by using surveys) or can be obtained from institutional partners such as your registrar or offices of institutional research. While surveys offer the most control, they are also labor intensive, and participation can be low.

Crow researchers use surveys. But we get more data by working directly with each institution’s registrar or institutional research office. This requires completing a data sharing agreement, usually as part of the IRB approval process. The data available, methods for sharing it, and restrictions on its use will vary between institutions.

Clarify the scope of sharing your corpus.

Sharing your corpus outside your institution is possible—but must be planned in advance. Your IRB application should make your plans clear.

Careful deidentification (removing or redacting names, emails, and other identifying information from writing) can expand opportunities for sharing your corpus. If you plan to deidentify the data by removing names, emails, and other identifying information, ensure you describe that process in your IRB application. For most IRBs, once data has been fully deidentified, it is no longer considered human subjects data, and can be more freely shared with others.

If you do share with others, take steps to ensure your corpus is not misused or misrepresented. Participants’ ability to control their own writing and have it shared as described in consent forms or agreements is incredibly important.

Establish processes for secure data storage.

Your corpus data and research records need to be securely stored and carefully backed up.

Most institutions provide password protected drives on servers that are backed up frequently. This keeps data secure and also reduces the likelihood of losing data due to equipment failures or human error.

As you make plans to process data, ensure those plans include data management. Get in the habit of being systematic about handling data, including research records such as opt-out requests or consent forms. Ensure any collaborators follow suit.

Summary

Ethical writing research requires multiple partners:

  • IRB or ethics board consultants
  • Registrar or institutional research staff
  • Department chairs
  • Writing program administrators
  • Writing instructors

The relationships Crow has cultivated with our participants and other stakeholders have helped our research be successful.

Video presentation

A video version of this content is available on the Crow YouTube channel.

Ethical issues in corpus building

Video: Ethical issues in corpus building

Navigating CIABATTA

Previous: 2. CIABATTA overview

Next: 4. Checking consents and collecting data