371 MB XML file from Google full of bookish goodness

By Bryan Rasmussen
June 25, 2008

Trying to put the two opposing feelings into the title of the post, on the one hand Google has compiled a database of all U.S books published between 1923 and 1963 that have lost copyright protection due to not being renewed, which is wonderful, and on the other hand they have made it available as a 371 MB XML file (inside of a 56 MB zipped file containing a Readme and a PDF document showing communications with Carl Malamud of public.resource.org and the U.S Copyright office) which is not so wonderful, because hey 371 MB XML file!!

The zip file holding the XML is here, everybody, start your Streaming.

An example of the XML format is shown below

<Title>In silenzio</Title>
<Name>Stefano Pirandello</Name>
<Name>Fausto Pirandello</Name>
<Name>Rosalia Lietta Aguirre Pirandello</Name>
<Published>His Novelle per un anno, v. 6</Published>

In silenzio. (His Novelle per un
anno, v. 6) © 1Jan23, AF12792.
R99258, 29Aug52, Stefano Pirandello,
Fausto Pirandello & Rosalia (Lietta)
Aguirre Pirandello (C)

You might also be interested in:

Popular Topics


Or, visit our complete archives.

Recommended for You

Got a Question?