Open Refine for Librarians

Wednesday, February 19, 2020

Recipe :Splitting out an OCLC number (or a number with a label prefix) out of a delimited list

I've been working a lot with Alma Analytics, and before the February release, our 035 field was repeated. This unfortunately meant that if I used the "network number" field for Analytics, I would get this:

(CaPaEBR)ebr10975975; (OCoLC)654843296; (CU-SC)b44543517-01cdl_scr_inst

You could use a formula in Analytics to give you just the OCLC number, but you could also do it in Open Refine. The recipe is this:

forEach(value.split(";"), v, if (contains(v, "<prefix label for the number you want to split out>"), v, "")).join("")

So, for example, if I want to split out the OCLC number, I would use:
forEach(value.split(";"), v, if (contains(v, "(OCoLC)"), v, "")).join("")

If I wanted the CaPaEBR number, I would use:
forEach(value.split(";"), v, if (contains(v, "(CaPaEBR)"), v, "")).join("")

If I wanted our old bib number (which is the last number in this string), I would use:
forEach(value.split(";"), v, if (contains(v, "(CU-SC)"), v, "")).join("")

Friday, January 25, 2019

Recipe: Using filters to identify all the blank columns in a spreadsheet

Rachel had an interesting problem for me - she had a spreadsheet with a plethora of columns. Some had information, and some were blank. It was tedious trying to use faceting to identify the blank columns, so she wondered if there was a way to quickly identify them in Open Refine.

The answer is quite simple, first identify how many total rows you have on your spreadsheet. Then, go to any column in Open Refine.

Then do facet->custom text facet.

In the box, enter the following GREL:

filter(row.columnNames,cn,isBlank(cells[cn].value))

This will make a facet out of each column, with the number of blank rows listed to the right:

There are 414 rows total on this spreadsheet, so any column with "414" next to it is completely blank.

Thursday, April 12, 2018

Recipe: Computing the Millennium checkdigit

More detail will be posted later. I just wanted to note this down.

We needed to extract our course reserves in Millennium into .csv format and the check digit for the item records associated with the instructors was cut off. To further complicate matters, one of our classes had about 106 associated items.

In order to calculate the check digit, you had to do the following calculation:

for item record i1234567, you had to multiply the 7 by 2, the 6 by 3, the 5 by 4, the 4 by 5, the 3 by 6, the 2 by 7, and the 1 by 8. Then, you had to add all those values together, divide by 11, and the remainder was the check digit. Anything that was remainder 10 became check digit x. So, for this item record, the summation is 112. Divide that by 11, and the remainder is 2 - so the full item record number is i12345672.

1. First, I changed the , delimiter right before the item record numbers to } in Notepad++. That way, when I imported to Open Refine, item records i3070797,i3054061,i3054062,i3054064,i3070931,,,,,,,,,,,,,

would all be in the same cell.

2. I split the column of item records into multi-valued cells using the , as a delimiter. My column was named "Column2" for this example.
3. I used add column based on this column and created column "remainder" with this GREL: value.replace ("i","")
This is so I could do the calculation for the check digit in a separate column to make it easier to concatenate later.
3. To do the multiplication and summation calculation, I did a transform on the remainder column and input this GREL:
sum(forEachIndex(reverse(splitByLengths(value,1,1,1,1,1,1,1)),i,v, (i+2)*toNumber(v)))
4. Unfortunately, this inserted a lot of zeros into the fields I wanted to leave blank, so I did another transform on remainder and did this to blank out the zeros: if(value == 0, "", value)
(You could probably just facet on the 0 and blank it out, too). This step has to be done, otherwise the calculation of the remainder will fill in the cells with the zeros with unwanted values.
5. Do another transform on column remainder to calculate the remainder:
mod(value,11)
6. Now concatenate Column2 and remainder together (can be either a transform or a new column):
if(isBlank(value), "",if (cells.remainder.value < 10, value + cells.remainder.value, value + "x"))
(Leaving out the check for isBlank gave me funky values again.)
7. Now remove any extra columns and join multi-valued cells on the concatenated column.

Wednesday, October 4, 2017

Recipe: Populating a column with sequential call numbers

You can actually fill a column in Open Refine with sequential call numbers. All you have to do is take advantage of the fact that row.index returns an integer.

Our local accession numbers for video games are GVD<number>, so if I wanted to populate a spreadsheet with :

GVD1100
GVD1101
GVD1102
GVD1103
etc.

You'd do : "GVD" + (row.index + 1100)
or
"<prefix>" + (row.index + <starting call number>)

Recipe and walkthrough - Joining the first two cells in a column and moving the third up

If I had this sample spreadsheet:

And I wanted to transform it to this spreadsheet:

I can use the column array conversion, a trick with array arithmetic, and nested ifs to do so in one GREL expression. Or I can use three transforms.

Assuming that all data in the "Subject" column is three subject and only three subjects per call#, and that there are no blank lines in the column, do the following:

1. Add a blank column.
2. Put in a placeholder value in first cell.
3. Move column to the beginning. If using an older version of Open Refine, switch to record view.
4. Transform with: if(mod(rowIndex, 3) == 0, row.record.cells["Subject"].value[rowIndex] + "; " +row.record.cells["Subject"].value[rowIndex+1], value)
5. Transform subject column again: if(mod(rowIndex, 3) == 1, row.record.cells["Subject"].value[rowIndex+1], value)
6. Last transform: if(mod(rowIndex, 3) == 2, "", value)
7. If you were in record view, switch back to rows. Delete placeholder column.

OR:

1. Add a blank column.
2. Put in a placeholder value in first cell.
3. Move column to the beginning. If using an older version of Open Refine, switch to record view.
4. Transform on the subject column and use this piece of GREL:

if(mod(rowIndex, 3) == 0, row.record.cells["Subject"].value[rowIndex] + "; " +row.record.cells["Subject"].value[rowIndex+1], if(mod(rowIndex,3)== 1, row.record.cells["Subject"].value[rowIndex+1], ""))
5. If you were in record view, switch back to rows. Delete placeholder column.

Recipe and walkthrough - Join selected cells in a column

Again, I'll be posting a recipe first and then a detailed walkthrough below.

Recipe:
Given this truncated sample data:

Assuming that there are more entries that aren't being shown, if I only want to join the two subject headings in Column 4 for Ammanati, Bartolomeo:

Make sure you're in the row view before you proceed
Add column based on any column, with whatever name (I'm calling it index), and "" in the Expression box.
Edit the very first cell in index, put in a placeholder value. (NOTE: put in one placeholder value only, doing two won't work.)
Move index to the first column position.
Switch to record view. (Note : this appears to be optional in Open Refine 2.9, but you may need to do it for older versions.)
Because I found it hard to read recipes with other people's column names, I am referring to Column 2, as the <test column>. (Since this is the column you use as the test condition for the if statement). Column 4 will be referred to as the <edit column>, to indicate which column I want to edit.

Transform on <edit column> : if(cells["<test column>"].value.contains("Ammanati"), row.record.cells["<edit column>"].value[rowIndex] + ";" + row.record.cells["<edit column>"].value[rowIndex+1], value)

Transform on <test column> and create a placeholder row: if(row.record.cells["<test column>"].value[rowIndex-1].contains("Ammanati"), "place", value)
Switch to row view.
Custom text facet on <test column>: value.contains("place"), select true
Star the placeholder rows.
Remove the custom facet and facet by star
All->edit rows->remove matching rows.
Remove the index column.

Long, detailed explanation:

Custom text faceting with booleans cheat sheet

Search Criteria (up to Open Refine 2.8)	Custom text facet
Facet on strings that have both <string one> and <string two>	and (value.contains("<string one>"), value.contains("<string two>"))
Facet on strings that have either <string one> or <string two>	or (value.contains("<string one>"), value.contains("<string two>"))
Facet on strings that have <string one> and <string two> and <string three>	and(value.contains("<string one>"), value.contains("<string two>"), value.contains("<string three>")) Note: You can chain as many value.contains() statements as you want.
Facet on strings that have <string one> or <string two> or <string three>	or(value.contains("<string one>"), value.contains("<string two>"), value.contains("<string three>")) Note: You can chain as many value.contains() statements as you want.
Facet on strings that don’t contain <string one>	not(value.contains("<string one>")
Facet on strings that don’t contain <string one> or <string two>	not(or (value.contains("<string one>"), value.contains("<string two>")))
Facet on strings that contain (<string one> and <string two>) or <string 3>	or(and (value.contains("<string one>"), value.contains("<string two>")), value.contains("<string three>") )
regex search : Facet on strings that contain "Playstation" or "PlayStation" or "Nintendo"	or(isNotNull(value.match(/.Play[sS]tation./)), value.contains("Nintendo"))

Search Criteria (Open Refine 3.0)	Custom text facet
Facet on strings that have both <string one> and <string two>	toString(and (value.contains("<string one>"), value.contains("<string two>")))
Facet on strings that have either <string one> or <string two>	toString(or (value.contains("<string one>"), value.contains("<string two>")))
Facet on strings that have <string one> and <string two> and <string three>	toString(and(value.contains("<string one>"), value.contains("<string two>"), value.contains("<string three>"))) Note: You can chain as many value.contains() statements as you want.
Facet on strings that have <string one> or <string two> or <string three>	toString(or(value.contains("<string one>"), value.contains("<string two>"), value.contains("<string three>"))) Note: You can chain as many value.contains() statements as you want.
Facet on strings that don’t contain <string one>	toString(not(value.contains("<string one>"))
Facet on strings that don’t contain <string one> or <string two>	toString(not(or (value.contains("<string one>"), value.contains("<string two>"))))
Facet on strings that contain (<string one> and <string two>) or <string 3>	toString(or(and (value.contains("<string one>"), value.contains("<string two>")), value.contains("<string three>") ))
regex search : Facet on strings that contain "Playstation" or "PlayStation" or "Nintendo"	toString(or(isNotNull(value.match(/.Play[sS]tation./)), value.contains("Nintendo")))

Open Refine for Librarians

Wednesday, February 19, 2020

Recipe :Splitting out an OCLC number (or a number with a label prefix) out of a delimited list

Friday, January 25, 2019

Recipe: Using filters to identify all the blank columns in a spreadsheet

Thursday, April 12, 2018

Recipe: Computing the Millennium checkdigit

Wednesday, October 4, 2017

Recipe: Populating a column with sequential call numbers

Monday, October 2, 2017

Recipe and walkthrough - Joining the first two cells in a column and moving the third up

Thursday, September 28, 2017

Recipe and walkthrough - Join selected cells in a column

Friday, September 15, 2017

Custom text faceting with booleans cheat sheet

Favorite Sci-Fi Shows

Contributors

Labels

Blog Archive