Citation Management
Citation Management automation and integration for research and references
Citation Management is a community skill for organizing and formatting academic citations programmatically, covering reference parsing, bibliography generation, citation style formatting, DOI resolution, and integration with reference databases for scholarly writing workflows.
What Is This?
Overview
Citation Management provides patterns for handling academic references in research software. It covers parsing citation data from BibTeX, RIS, and CSL-JSON formats into structured objects, formatting citations according to styles such as APA, MLA, Chicago, and IEEE, resolving DOIs and ISBNs to retrieve complete metadata from CrossRef and OpenLibrary, generating bibliographies in multiple output formats including HTML and plain text, and deduplicating reference collections by matching on DOI, title similarity, and author overlap. The skill enables developers to build tools that automate citation tasks in academic writing pipelines.
Who Should Use This
This skill serves developers building academic writing tools that need reference management features, researchers automating bibliography generation for papers and reports, and teams creating publishing pipelines that enforce consistent citation formatting.
Why Use It?
Problems It Solves
Manually formatting citations in different styles is error-prone when switching between journal requirements. Parsing BibTeX and RIS files requires handling format quirks and encoding variations across export tools. Resolving incomplete references requires API calls to multiple services. Detecting duplicate references in merged bibliographies needs fuzzy matching that accounts for name variations.
Core Highlights
BibTeX parser reads .bib files into structured dictionaries. Style formatter applies APA, MLA, Chicago, and IEEE templates to produce correctly formatted citations. DOI resolver fetches complete metadata from CrossRef for partial references. Deduplication engine identifies duplicates using similarity thresholds.
How to Use It?
Basic Usage
import re
from dataclasses import dataclass, field
@dataclass
class Reference:
title: str
authors: list[str]
year: int
journal: str = ""
doi: str = ""
volume: str = ""
pages: str = ""
def parse_bibtex(content: str) -> list[Reference]:
entries = re.findall(
r"@\w+\{([^}]+(?:\{[^}]*\}[^}]*)*)\}",
content, re.DOTALL)
refs = []
for entry in entries:
fields = {}
for m in re.finditer(
r"(\w+)\s*=\s*[{"](.*?)["}]",
entry, re.DOTALL):
fields[m.group(1).lower()] = (
m.group(2).strip())
authors = [a.strip() for a in
fields.get("author", "")
.split(" and ")]
refs.append(Reference(
title=fields.get("title", ""),
authors=authors,
year=int(fields.get("year", 0)),
journal=fields.get("journal", ""),
doi=fields.get("doi", ""),
volume=fields.get("volume", ""),
pages=fields.get("pages", "")))
return refs
def format_apa(ref: Reference) -> str:
auth = ", ".join(ref.authors[:3])
if len(ref.authors) > 3:
auth += ", et al."
cite = f"{auth} ({ref.year}). {ref.title}."
if ref.journal:
cite += f" *{ref.journal}*"
if ref.volume:
cite += f", *{ref.volume}*"
if ref.pages:
cite += f", {ref.pages}"
cite += "."
if ref.doi:
cite += f" https://doi.org/{ref.doi}"
return citeReal-World Examples
import httpx
from difflib import SequenceMatcher
class DOIResolver:
def __init__(self):
self.http = httpx.Client(timeout=15)
def resolve(self, doi: str) -> Reference:
resp = self.http.get(
f"https://api.crossref.org/works/{doi}")
data = resp.json()["message"]
authors = [
f"{a.get('family', '')}, "
f"{a.get('given', '')}"
for a in data.get("author", [])]
title_parts = data.get("title", [""])
year_parts = data.get("published", {})
return Reference(
title=title_parts[0],
authors=authors,
year=year_parts.get(
"date-parts", [[0]])[0][0],
journal=data.get(
"container-title", [""])[0],
doi=doi)
class Deduplicator:
def __init__(self, threshold: float = 0.85):
self.threshold = threshold
def is_duplicate(self, a: Reference,
b: Reference) -> bool:
if a.doi and b.doi:
return a.doi.lower() == b.doi.lower()
ratio = SequenceMatcher(
None, a.title.lower(),
b.title.lower()).ratio()
return ratio >= self.threshold
def deduplicate(
self, refs: list[Reference]
) -> list[Reference]:
unique = []
for ref in refs:
if not any(self.is_duplicate(ref, u)
for u in unique):
unique.append(ref)
return uniqueAdvanced Tips
Normalize author names to a consistent format before deduplication to handle variations like first name abbreviation. Cache CrossRef responses to avoid repeated lookups. Validate BibTeX entries for required fields before formatting.
When to Use It?
Use Cases
Build a bibliography formatter that converts a BibTeX file into a styled reference list for a manuscript. Create a reference validator that resolves DOIs and fills in missing metadata for incomplete entries. Implement a deduplication tool that merges reference libraries from multiple collaborators into a clean collection.
Related Topics
Academic writing tools, BibTeX format, citation style languages, DOI resolution, and reference database management.
Important Notes
Requirements
Python with HTTP client library for DOI resolution. BibTeX or RIS files for reference input parsing. Knowledge of target citation styles for formatting configuration.
Usage Recommendations
Do: normalize author name formats when merging references from different sources. Use DOI as the primary identifier for deduplication when available. Validate formatted output against the target journal style guide before submission.
Don't: rely solely on title matching for deduplication, which fails on minor title variations. Assume all BibTeX files use consistent encoding without checking for UTF-8 and LaTeX escape sequences. Hard-code citation formats instead of using configurable templates that support multiple styles.
Limitations
CrossRef API coverage varies and some DOIs may not return complete metadata. BibTeX parsing cannot handle all non-standard format variations produced by different reference managers. Citation style formatting for edge cases like institutional authors may require manual adjustment.
More Skills You Might Like
Explore similar skills to enhance your workflow
Nelson
Integrate Nelson for automated data management and streamlined information retrieval within your technical ecosystem
Cosmos Vulnerability Scanner
Cosmos Vulnerability Scanner automation and integration
LinkedIn automation via browser relay or cookies for messaging, profile viewing, and network
Ipinfo Io Automation
Automate Ipinfo IO operations through Composio's Ipinfo IO toolkit via
React Modernization
Master React version upgrades, class to hooks migration, concurrent features adoption, and codemods for automated transformation
Sarif Parsing
Automate and integrate SARIF Parsing to streamline security analysis results