Citation Management

Citation Management automation and integration for research and references

Citation Management is a community skill for organizing and formatting academic citations programmatically, covering reference parsing, bibliography generation, citation style formatting, DOI resolution, and integration with reference databases for scholarly writing workflows.

What Is This?

Overview

Citation Management provides patterns for handling academic references in research software. It covers parsing citation data from BibTeX, RIS, and CSL-JSON formats into structured objects, formatting citations according to styles such as APA, MLA, Chicago, and IEEE, resolving DOIs and ISBNs to retrieve complete metadata from CrossRef and OpenLibrary, generating bibliographies in multiple output formats including HTML and plain text, and deduplicating reference collections by matching on DOI, title similarity, and author overlap. The skill enables developers to build tools that automate citation tasks in academic writing pipelines.

Who Should Use This

This skill serves developers building academic writing tools that need reference management features, researchers automating bibliography generation for papers and reports, and teams creating publishing pipelines that enforce consistent citation formatting.

Why Use It?

Problems It Solves

Manually formatting citations in different styles is error-prone when switching between journal requirements. Parsing BibTeX and RIS files requires handling format quirks and encoding variations across export tools. Resolving incomplete references requires API calls to multiple services. Detecting duplicate references in merged bibliographies needs fuzzy matching that accounts for name variations.

Core Highlights

BibTeX parser reads .bib files into structured dictionaries. Style formatter applies APA, MLA, Chicago, and IEEE templates to produce correctly formatted citations. DOI resolver fetches complete metadata from CrossRef for partial references. Deduplication engine identifies duplicates using similarity thresholds.

How to Use It?

Basic Usage

import re
from dataclasses import dataclass, field

@dataclass
class Reference:
    title: str
    authors: list[str]
    year: int
    journal: str = ""
    doi: str = ""
    volume: str = ""
    pages: str = ""

def parse_bibtex(content: str) -> list[Reference]:
    entries = re.findall(
        r"@\w+\{([^}]+(?:\{[^}]*\}[^}]*)*)\}",
        content, re.DOTALL)
    refs = []
    for entry in entries:
        fields = {}
        for m in re.finditer(
                r"(\w+)\s*=\s*[{"](.*?)["}]",
                entry, re.DOTALL):
            fields[m.group(1).lower()] = (
                m.group(2).strip())
        authors = [a.strip() for a in
                   fields.get("author", "")
                   .split(" and ")]
        refs.append(Reference(
            title=fields.get("title", ""),
            authors=authors,
            year=int(fields.get("year", 0)),
            journal=fields.get("journal", ""),
            doi=fields.get("doi", ""),
            volume=fields.get("volume", ""),
            pages=fields.get("pages", "")))
    return refs

def format_apa(ref: Reference) -> str:
    auth = ", ".join(ref.authors[:3])
    if len(ref.authors) > 3:
        auth += ", et al."
    cite = f"{auth} ({ref.year}). {ref.title}."
    if ref.journal:
        cite += f" *{ref.journal}*"
        if ref.volume:
            cite += f", *{ref.volume}*"
        if ref.pages:
            cite += f", {ref.pages}"
        cite += "."
    if ref.doi:
        cite += f" https://doi.org/{ref.doi}"
    return cite

Real-World Examples

import httpx
from difflib import SequenceMatcher

class DOIResolver:
    def __init__(self):
        self.http = httpx.Client(timeout=15)

    def resolve(self, doi: str) -> Reference:
        resp = self.http.get(
            f"https://api.crossref.org/works/{doi}")
        data = resp.json()["message"]
        authors = [
            f"{a.get('family', '')}, "
            f"{a.get('given', '')}"
            for a in data.get("author", [])]
        title_parts = data.get("title", [""])
        year_parts = data.get("published", {})
        return Reference(
            title=title_parts[0],
            authors=authors,
            year=year_parts.get(
                "date-parts", [[0]])[0][0],
            journal=data.get(
                "container-title", [""])[0],
            doi=doi)

class Deduplicator:
    def __init__(self, threshold: float = 0.85):
        self.threshold = threshold

    def is_duplicate(self, a: Reference,
                      b: Reference) -> bool:
        if a.doi and b.doi:
            return a.doi.lower() == b.doi.lower()
        ratio = SequenceMatcher(
            None, a.title.lower(),
            b.title.lower()).ratio()
        return ratio >= self.threshold

    def deduplicate(
            self, refs: list[Reference]
            ) -> list[Reference]:
        unique = []
        for ref in refs:
            if not any(self.is_duplicate(ref, u)
                       for u in unique):
                unique.append(ref)
        return unique

Advanced Tips

Normalize author names to a consistent format before deduplication to handle variations like first name abbreviation. Cache CrossRef responses to avoid repeated lookups. Validate BibTeX entries for required fields before formatting.

When to Use It?

Use Cases

Build a bibliography formatter that converts a BibTeX file into a styled reference list for a manuscript. Create a reference validator that resolves DOIs and fills in missing metadata for incomplete entries. Implement a deduplication tool that merges reference libraries from multiple collaborators into a clean collection.

Related Topics

Academic writing tools, BibTeX format, citation style languages, DOI resolution, and reference database management.

Important Notes

Requirements

Python with HTTP client library for DOI resolution. BibTeX or RIS files for reference input parsing. Knowledge of target citation styles for formatting configuration.

Usage Recommendations

Do: normalize author name formats when merging references from different sources. Use DOI as the primary identifier for deduplication when available. Validate formatted output against the target journal style guide before submission.

Don't: rely solely on title matching for deduplication, which fails on minor title variations. Assume all BibTeX files use consistent encoding without checking for UTF-8 and LaTeX escape sequences. Hard-code citation formats instead of using configurable templates that support multiple styles.

Limitations

CrossRef API coverage varies and some DOIs may not return complete metadata. BibTeX parsing cannot handle all non-standard format variations produced by different reference managers. Citation style formatting for edge cases like institutional authors may require manual adjustment.