IntegrationsAI Agents

How to Give Your LLM Access to Scientific Computing

Why LLMs need external tools for science, tool-use approaches (function calling, MCP), and how to connect SciRouter's 30+ scientific tools to GPT-4 and Claude.

Ryan Bethencourt
April 25, 2026
10 min read

The Problem: LLMs Hallucinate Science

Large language models are remarkably good at discussing scientific concepts. Ask Claude about protein folding mechanisms or GPT-4 about drug-likeness rules, and you will get articulate, mostly-correct answers. But ask an LLM to actually fold a protein, compute a binding affinity, or calculate molecular weight from a SMILES string, and the results fall apart.

The reason is fundamental: LLMs are text prediction engines, not computation engines. When a model generates a pLDDT score or a LogP value, it is pattern-matching against training data, not running physics simulations or cheminformatics algorithms. The numbers look plausible but are often wrong — sometimes subtly, sometimes wildly. In drug discovery and structural biology, wrong numbers can send entire research programs in the wrong direction.

This is the hallucination problem applied to science, and it is more dangerous than hallucination in casual conversation because the outputs look precise. A hallucinated molecular weight of 342.4 g/mol looks just as authoritative as a computed one, but could be off by 50%.

The Solution: Give LLMs Real Tools

The fix is not to make LLMs better at math (though that helps at the margins). The fix is to give LLMs access to real scientific computing tools — the same tools researchers use — and let the model decide when to call them. Instead of generating a molecular weight from memory, the LLM calls RDKit. Instead of guessing a protein structure, it calls ESMFold.

This pattern is called tool use or function calling, and it transforms LLMs from unreliable science chatbots into legitimate research assistants. The LLM handles natural language understanding, planning, and synthesis. The tools handle computation. Each does what it is good at.

Three Approaches to Tool Use

There are three main approaches for connecting LLMs to scientific tools, each with different trade-offs in complexity, flexibility, and user experience.

1. Function Calling (OpenAI / Anthropic API)

Both GPT-4 and Claude support function calling natively. You define tool schemas in your API request, the model outputs structured JSON when it wants to call a tool, your code executes the tool, and you pass the result back. This gives you full control but requires writing application code for every integration.

Function calling with GPT-4 + SciRouter
import openai
import requests

# Define the tool schema
tools = [{
    "type": "function",
    "function": {
        "name": "fold_protein",
        "description": "Predict 3D structure from amino acid sequence",
        "parameters": {
            "type": "object",
            "properties": {
                "sequence": {
                    "type": "string",
                    "description": "Amino acid sequence"
                }
            },
            "required": ["sequence"]
        }
    }
}]

# LLM decides to call the tool
response = openai.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Fold FVNQHLCGSHLVEALYLVCGERGFFYTPKT"}],
    tools=tools,
)

# Execute against SciRouter
if response.choices[0].message.tool_calls:
    call = response.choices[0].message.tool_calls[0]
    result = requests.post(
        "https://api.scirouter.ai/v1/proteins/fold",
        headers={"Authorization": "Bearer sk-sci-YOUR_KEY"},
        json={"sequence": call.function.arguments["sequence"]}
    )
    print(result.json())

2. Model Context Protocol (MCP)

MCP is an open standard created by Anthropic that takes a different approach. Instead of defining tool schemas in every request, you run an MCP server that advertises available tools. AI clients like Claude Desktop connect to the server and automatically discover what tools are available. The user never writes integration code — they configure the server once and the AI handles the rest.

SciRouter provides a hosted MCP server at mcp.scirouter.ai that exposes all 30+ scientific computing tools. Configure it in Claude Desktop and you can immediately ask Claude to fold proteins, dock molecules, predict ADMET properties, and more — all through natural conversation.

Tip
MCP is the fastest way to get started. No code required. See our MCP setup guide for step-by-step instructions.

3. Agent Frameworks (LangChain, AutoGen)

Agent frameworks like LangChain and AutoGen provide higher-level abstractions for building multi-step workflows. You define tools as Python functions, and the framework handles the loop of LLM reasoning, tool calling, and result integration. This is ideal for building autonomous pipelines that chain multiple SciRouter tools together.

SciRouter's Python SDK integrates directly with LangChain as a toolkit. Import the SciRouterToolkit, pass your API key, and all tools become available to your LangChain agent. See our LangChain integration guide for a complete tutorial.

Example: An LLM Folds a Protein via Function Call

Let's walk through a concrete example. A researcher asks their AI assistant to analyze the insulin B-chain. Here is what happens behind the scenes:

  • The user types: "Fold this protein and tell me about it: FVNQHLCGSHLVEALYLVCGERGFFYTPKT"
  • The LLM recognizes this as an amino acid sequence and decides to call the ESMFold tool
  • The application sends the sequence to SciRouter's /v1/proteins/fold endpoint
  • ESMFold predicts the 3D structure and returns coordinates, pLDDT scores, and metadata
  • The LLM receives the results and explains them in natural language
  • The researcher asks follow-up questions, triggering additional tool calls as needed

The key insight is that the LLM never tries to predict the protein structure itself. It delegates computation to ESMFold, receives real results, and then uses its language abilities to interpret and explain those results. This is the pattern that makes LLM-powered science reliable.

What Tools Should You Connect?

The tools you connect depend on your research domain. Here are the most common categories and the SciRouter tools that serve them:

Structural Biology

  • ESMFold — Protein structure prediction from sequence
  • Boltz-2 — Protein complex and protein-ligand structure prediction
  • Pocket Detection — Find druggable binding sites on protein surfaces

Medicinal Chemistry

Drug Discovery

  • DiffDock — AI-powered molecular docking without search box definition
  • AutoDock Vina — Physics-based molecular docking with scoring functions
  • Format Conversion — Convert between SMILES, InChI, MOL, and PDB formats

Best Practices for LLM Tool Use in Science

Connecting tools to an LLM is the easy part. Making the integration reliable and useful requires attention to a few important practices.

Validate Inputs Before Sending

LLMs sometimes generate malformed inputs — invalid SMILES strings, sequences with non-standard amino acids, or SMILES that parse but represent impossible molecules. Always validate inputs before sending them to computation tools. SciRouter handles this server-side with clear error messages, but catching errors early saves credits and improves user experience.

Show Tool Calls Transparently

When an LLM calls a tool, show the user what happened. Display the tool name, the input parameters, and the raw results before the LLM's interpretation. This builds trust and lets domain experts verify that the computation was set up correctly. The SciRouter Agent Playground does this with tool call badges on every message.

Chain Tools for Multi-Step Workflows

The real power of LLM tool use emerges when you chain tools together. Fold a protein, then dock a ligand against it, then check the ligand's ADMET properties. Each step's output feeds the next. The LLM orchestrates the workflow while each tool handles its specialized computation. This is what the Agent Builder is designed to make easy.

Set Reasonable Credit Budgets

Multi-step workflows can consume credits quickly. A fold-dock-ADMET pipeline uses roughly 16-18 credits. Set per-conversation or per-session credit budgets so autonomous agents do not run away with your allocation. SciRouter's usage tracking API lets you monitor consumption in real time.

The Future: Autonomous Science Agents

Today, most LLM-science integrations are interactive — a researcher asks a question, the LLM calls a tool, and the researcher reviews the result. The next frontier is autonomous agents that can plan and execute entire research workflows with minimal human oversight.

Imagine telling an agent: "Screen this library of 500 compounds against EGFR and give me the top 10 candidates with favorable ADMET profiles." The agent would compute properties for all 500, filter by drug-likeness, dock the survivors, predict ADMET, rank the results, and present a summary — all without human intervention between steps.

This is not science fiction. The individual tools exist today. What is emerging is the orchestration layer that connects them reliably. SciRouter provides the tool infrastructure. MCP and agent frameworks provide the orchestration. LLMs provide the reasoning. The pieces are coming together.

Note
Ready to connect your LLM to real science tools? Start with the MCP setup guide for the fastest path, or explore the LangChain integration for programmatic workflows.

Frequently Asked Questions

Can GPT-4 or Claude do science without external tools?

LLMs can discuss scientific concepts, summarize papers, and suggest experimental designs. However, they cannot perform actual computation — they cannot fold a protein, calculate molecular properties from a SMILES string, or run a docking simulation. For any task that requires numerical precision or physics-based modeling, external tools are essential.

What is function calling and how does it help with science?

Function calling is a capability built into models like GPT-4 and Claude that lets the LLM output structured JSON describing which tool to call and with what parameters. The application then executes the tool and returns results to the LLM. For science, this means the LLM can decide to fold a protein or calculate molecular properties and receive real computed results.

What is the difference between MCP and function calling?

Function calling requires you to define tool schemas in every API request. MCP (Model Context Protocol) is a persistent connection where a tool server advertises its capabilities and the AI client discovers them automatically. MCP is more like a plugin system — configure it once and the tools are always available.

How many credits does a typical LLM science workflow use?

A simple property calculation costs 1 credit. Protein folding costs 5 credits. Molecular docking costs 10 credits. A multi-step workflow like fold-then-dock-then-ADMET typically uses 15-20 credits. SciRouter provides 500 free credits per month on the free tier.

Can I use SciRouter tools with open-source LLMs?

Yes. Any LLM framework that supports function calling or tool use can integrate with SciRouter via the REST API or Python SDK. LangChain, LlamaIndex, and AutoGen all support SciRouter as a tool provider. Open-source models running through these frameworks can call SciRouter endpoints just like GPT-4 or Claude.

Try this yourself

500 free credits. No credit card required.