The new OBO API: an overview

Introductory notes on using the new (2023) version of the Old Bailey Online API
Author

Sharon Howard

Published

18 May 2025

Introduction

As part of a major overhaul and refresh of the Old Bailey Online in 2023, there were major breaking changes to the OBO API. The available documentation for the new API is sparse, and these notes are intended to help users find their way around.

I cover two of the available API endpoints here:

  • https://www.dhi.ac.uk/api/data/oldbailey_record - for searching trials and other sections in the Old Bailey Proceedings
  • https://www.dhi.ac.uk/api/data/oldbailey_record_single - for full data about individual trials and other sections in the Proceedings

There are two other documented endpoints, one for Ordinary’s Accounts and one for the Associated Records database. I haven’t yet looked at these in any detail and may add further notes about them later, but would expect them to work in similar ways.

There is a further undocumented feature for obtaining aggregated data. I’ll cover that in a separate post.

See also this worked example to search and download data from the API using R.

For further reference:

  • detailed background information about the search categories (offences, verdicts, sentences/punishments, victims and defendants) can be found on the OBO website.
  • all the OBO search forms have extensive contextual help which is also relevant for understanding the workings of the API.

Notes on search parameters

The official documentation only mentions keyword searching with the text parameter. These notes are intended to fill in some of the gaps for more structured searches of offences, etc.

The crucial thing to know is that API URLs query strings mirror search results URLs on the website.

So, for example, the URL for the first page of results for a site search for the offence “killing”:

  • https://www.oldbaileyonline.org/search/crime?offence=kill

The corresponding api query url:

  • https://www.dhi.ac.uk/api/data/oldbailey_record?offence=kill

Note that the /crime/ segment of the website URL is not needed.

So generally you should be able to test out queries on the website first to work out what parameters you need for API queries, and I’m not going to list a lot of examples here. There could potentially be specific omissions/variations in the API that I’m not aware of, so I do recommend generally testing out results for whatever you want to do.

pagination

The example URL above will only get the first 10 results, but most searches return more than that. The parameter for subsequent pages is from. Numbering starts from 0, not 1.

Eg, to get the second page of results: from=10

So, the second page for the example above: https://www.dhi.ac.uk/api/data/oldbailey_record?offence=kill&from=10

dates and date ranges

A range specifying month and year (March 1739-August 1755): month_gte=3&month_lte=8&year_gte=1739&year_lte=1755

For the year range only: year_gte=1739&year_lte=1755

A single year: year_gte=1739&year_lte=1739

offences

The top level category breaking peace (including all subcategories): offence=breakingPeace

For wounding, subcategory of breakingPeace: offence=wounding

For subcategories you don’t need to specify the parent category except for “other” subcategories

  • offence=breakingPeaceOther / offence=deceptionNoDetail

“NoDetail” is a new subcategory which essentially means the same thing as “Other” but reflects some inconsistencies in the XML where the offence was tagged without any subcategory at all (but should properly have been Other). In the old API I think there was no way to pull these out separately.

verdicts and sentences

A verdict of guilty: verdict=guilty

A sentence of death: punishment=death

pleas (new)

Pleaded guilty: plea=guilty

I think all the old “pleaded” verdicts are still available under verdict and on a superficial inspection seem to get much the same results. Eg plea=guilty and verdict=pleadedGuilty seem to get the same results.

However, this new option is the result of a fairly recent project and there may well be details I don’t know about in the background information on the website.

item types

To restrict to trial texts only (ie excluding advertisements, supplementary, etc): div_type=trialAccount

This is likely to make fairly slight differences when doing structured offence/verdict/sentence queries but will be more relevant for text queries. It’s recommended if you want to be absolutely consistent and certain about where results come from.

defendants and victims

Eg for gender:

  • defendant_gender=female
  • victim_gender=male

The JSON

There is a lot of… stuff… in the JSON returned by the API, much of it not useful for data analysis. This focuses on some bits you might care about.

search results

Browse a sample of the search results JSON:

jsonedit(sample_search_json)

(search for housebreaking between 1700 and 1704)

The number of results (hits > total)

sample_search_json$hits$total
[1] 14

Trial IDs (hits > hits > _source > idkey) - needed to get full data of individual trials via the single record endpoint.

sample_search_json$hits$hits$`_source`$idkey
 [1] "t17000115-17" "t17000115-21" "t17000828-4"  "t17000828-13" "t17000828-14"
 [6] "t17000828-28" "t17000828-29" "t17021014-5"  "t17021014-9"  "t17021014-19"

The search returns very limited information about each trial (or other item); it does include a text snippet and the page title for each item, which could be useful for some purposes.

sample_search_json$hits$hits$`_source`$text[[1]]
[1] "Anne Buckler and Elizabeth Jefferys , both of the Parish of St. Andrews Holborne , the first was Indicted as Principal, and the other as Accessary before the Fact committed, for breaking the House of Thomas Beckenson , on the 9th of December last, with an Intention to steal his Goods . The Evidence not being sufficient to Convict the Principal they were both acquitted ."
sample_search_json$hits$hits$`_source`$title[[1]]
[1] "Anne Buckler. Elizabeth Jefferys. Theft; housebreaking. 15th January 1700."

single records

The single record endpoint is needed to get the full data about items.

It includes three versions of the trial/item text; two are most likely to be of interest for data analysis.

  • xml - the full tagged XML (hits > hits > _source > xml)
  • text - a plain text version (hits > hits > _source > text)

plus

  • metadata - this contains the same information as the simplified summary table at the top of each trial page, which might be sufficient for some purposes (hits > hits > _source > metadata)

Browse a sample trial JSON:

jsonedit(sample_trial_json)
## sample_trial_json$hits$hits$`_source`$xml

For users of the old OBAPI

A few notes about the major differences between new and old API.

changes to URL query strings

Previously for offences/verdicts/sentences in the URL it was always necessary to explicitly spell out category_subcategory. That’s no longer necessary except in a few specific contexts (and it’s done slightly differently).

An example: offence “fraud”, subcategory of deception. The relevant bit of a search URL previously looked like this:

_offences_offenceCategory_offenceSubcategory=deception_fraud

That’s been replaced by the much shorter and simpler

offence=fraud

Now the top level category is only needed if

  • searching for the whole category, eg: offence=deception
  • searching for subcategories Other or NoDetail, which need to be like this
    • offence=deceptionOther (previously deception_other)
    • offence=deceptionNoDetail (not in the previous version)

what’s gone away

The new API is in many ways more powerful than the old and with fewer limits, but it’s also more generic and some functionality is no longer available (as far as I can tell, though the lack of documentation can make things hard to find).

  • the terms endpoint which gave a list of all the fields and values available in the API
  • the option (breakdown=[field]) for a breakdown of subcategories and so on for a search
  • the option (return=zip) to download zip archives of multiple trials’ XML

The functionality can be reproduced but usually with some loss of convenience; if you have scripts using them you’ll probably find that their replacements need to be longer and more complicated.

The OBAPI demonstrator has also been withdrawn; this has no replacement.

It used to be possible to specify the date of a particular session of the Proceedings (in fact, session dates may have been the only way to limit by date, though my memory is fuzzy on this). That no longer seems to be an option; there’s only month and year. Month-year can often be used as a proxy for session date, but isn’t completely perfect; after 1834 there could be more than one session in the same month.