Temporary Format Variations

The HTML Content Format page describes the target format. Not all opinions use that format yet. This page documents the temporary variations that exist across the ~13M opinions in the database while older processing pipelines are being phased out.

Detailed format references (internal):

  • footnote_formats.md / footnote_formats_compact.md — 5 formats
  • page_number_formats.md — 3 formats
  • blockquote_formats.md — 2 formats
  • citation_formats.md — 3 formats (cases, statutes, CL spans)
  • case_header_formats.md — 3 formats
  • outlier_sources.md — 3 low-volume sources (<10K) with unique edge-case formats

Case Header

Final — ~1.86M opinions. Structured <details> with data attributes. Example: 1195185

<details class="midpage-case-info" open>
  <summary>Case Information</summary>
  <p class="midpage-case-name" shortName="Doe v. Acme">DOE v. ACME CORP.</p>
  <p class="midpage-docket" docket="23-1456">No. 23-1456</p>
  <p class="midpage-court" court="9th Cir.">United States Court of Appeals, Ninth Circuit</p>
  <p class="midpage-date" date="2024-03-15">March 15, 2024</p>
</details>

Temporary — ~3.75M opinions. Unstructured <details> with no field classes or data attributes — just raw styled text from the PDF. Example: 1000020258241

<details class="midpage-case-info">
  <summary class="midpage-case-info-summary">Case Information</summary>
  <p><span style="font-weight: bold;">No. 05-13-01080-CV
    BRETT SHIPP, Appellant V. DR. RICHARD MALOUF, Appellees</span></p>
</details>

None — ~7.4M bulk-imported opinions have no <details> tag at all. Example: 3141163


Page Numbers

Final — ~1.86M opinions. Custom <midpage-ps> element. Example: 1195185

<midpage-ps n="124"/>

Temporary — ~3.75M opinions. PDF/OCR parser uses a <span>. Example: 1000020258241

<span class="star-pagination" source="midpage-pdf-parser">*124</span>

Temporary — ~6M harvard_bookscan opinions use an XML element. Example: 5445171

<page-number citation-index="1" label="303">*303</page-number>

None — ~1.4M bulk-imported HTML opinions have no page markers. Example: 3141163


Footnotes

Final — ~1.66M opinions. Custom elements with bidirectional anchor links. Example: 1195185

<midpage-fnmark id="fnref-1" n="1"><a href="#fn-1">1</a></midpage-fnmark>
...
<section class="midpage-footnotes">
  <midpage-fn id="fn-1" n="1">
    <a class="midpage-fn-backlink" href="#fnref-1">1</a>
    The footnote text...
  </midpage-fn>
</section>

Temporary (v2) — ~195K opinions. Simpler self-closing marks, no anchor links. Example: 311aab31-9c0e-441e-9c49-ada0825bdc03

<midpage-fnmark n="1"/>
...
<midpage-fn n="1">The footnote text...</midpage-fn>

Temporary — ~3.75M opinions. PDF/OCR parser uses class-based <sup> / <p>. Example: 1000020258241

<sup class="midpage_footnotemark_183">[1]</sup>
...
<p class="midpage_footnote_183">[1] The footnote text...</p>

Temporary — ~7.39M bulk-imported opinions use anchor-based <a> / <div>. Example: 1433250

<a class="footnote" href="#fn1" id="fn1_ref">1</a>
...
<div class="footnotes">
  <div class="footnote" id="fn1" label="1">
    <a class="footnote" href="#fn1_ref">1</a>
    <p>The footnote text...</p>
  </div>
</div>

Blockquotes

Final — All Midpage-processed opinions (~5.6M) use standard <blockquote>. Example: 1195185

<blockquote>
  <p>All disputes shall be resolved by binding arbitration.</p>
</blockquote>

None — Bulk-imported opinions (~7.4M) have no structural blockquote markup. Quoted text uses tab indentation or <pre> blocks. Example: 3141163


Case Citations

Final — ~1.86M opinions. Structured <midpage-case> with metadata attributes. Example: 1195185

<midpage-case case="Smith v. Jones" cite="262 F.3d 305" pinpoint="320"
  court="4th Cir." date="2001">
  <i>Smith v. Jones</i>, 262 F.3d 305, 320 (4th Cir. 2001)
</midpage-case>

Temporary — ~7.4M bulk-imported opinions use CourtListener citation spans. Example: 3141163

<span class="citation" data-id="2417455">
  <a href="/opinion/2417455/olivo-v-state/#522">918 S.W.2d 519, 522</a>
</span>
<span class="citation no-link">Tex. Fam. Code Ann. § 263.405</span>

None — PDF/OCR parser opinions (~3.75M) have no citation markup. Example: 1000020258241