LLMs for Engineering
I was having a conversation the other day with a friend who said LLMs were all trained on stolen data. I said yes — this is called “primitive accumulation” — it is the origin of capitalism.
But I am not really interested in LLM content generation for our future when we all sleep in pods, matrix style, hooked up to vr headsets watching AI generated entertainment that will probably be better than some entertainment today (reality tv for example).
What I AM interested in is the relationship between LLMs and engineering and how LLMs can insert itself into the engineering process. Build me a component that does X. This can be an electrical component, a chemical component, a mechanical component, a programming component. My experiments have been programming components since this is what I understand. But I have also been studying phytochemistry- not much — so I really dont know anything.
But I was looking at methylation and wanted to experiment with that.
I asked:
> create an interactive diagram and process methylation — use tryptophan as an example — use python to write the program and use bio chem python packages
The result was a junky non interactive diagram generated by python.
I prompted
>this is not interactive
And you know what — I got a really sort of fun- but sort of useless interactive diagram. It took a few prompts to fix because it was using Jupyter notebooks and I was not, but FUN!.
Take a look at the html code in the git repo — I am into this.
I also wanted to generate the actual methylation process — ideally as an animation with bonds coming and going but this was not as successful and I gave up.
The amino acid tryptophan is crucial for serotonin production in the gut.
What is interesting is the step by step discussion of methylation — I’ll put here. But I dont know if this is an AI hallucination
>The methylation of tryptophan to form methylated tryptophan involves the addition of a methyl group (CH3) to one of the carbon atoms in the tryptophan molecule. Specifically, the process involves the methylation of the indole ring in tryptophan. Here’s an intermediate step-by-step explanation of how this methylation process happens:
1. **Tryptophan Structure**:
The chemical structure of tryptophan contains an indole ring, which is a hexagonal ring fused to a five-membered ring, along with various functional groups. Here’s a simplified representation:
“`
H
|
N — C — C — C — C
| | | |
H H H H
“`
2. **Methyl Group Addition**:
The methylation process involves adding a methyl group (CH3) to one of the carbon atoms in the indole ring. The exact location of the methylation can vary depending on the enzyme or reaction conditions, but for the sake of illustration, let’s consider methylation at the 7th carbon (C7) of the indole ring. Here’s the structure after methylation:
“`
H
|
N — C — C — C — C
| | | |
H H H H
|
CH3
“`
3. **Methylated Tryptophan**:
The result is the formation of methylated tryptophan, where the indole ring now has a methyl group attached to one of its carbons:
“`
H
|
N — C — C — C — C
| | | |
H H H H
|
CH3
“`
In a biological context, the methylation of tryptophan is catalyzed by specific enzymes. The exact mechanism and location of methylation can vary depending on the specific enzyme and the organism in which it occurs. Methylation reactions are essential for various biological processes, including the modification of amino acids, nucleotides, and other biomolecules, which can have significant impacts on cellular function and gene regulation.
OpenAI. (2023). _ChatGPT_ (September 25 Version) [Large language model]. https://chat.openai.com
When I asked it to generate python I got this addtion
>In this code:
1. We create a molecule object for tryptophan using its SMILES notation.
2. We identify the carbon atom (C7) in the indole ring where we want to add the methyl group. In a real biological context, the methylation site would be determined by specific enzymes.
3. We create a methyl group molecule using its SMILES notation.
4. We merge the tryptophan molecule with the methyl group using `Chem.CombineMols`.
5. We add a single bond between the methyl group and the chosen carbon atom to complete the methylation process.
6. Finally, we use RDKit to draw and display the chemical structures of tryptophan and methylated tryptophan.
Please note that this is a simplified representation of the methylation process. In a biological context, methylation is a complex enzymatic process that can involve specific enzymes and co-factors. This code only illustrates the addition of a methyl group to a specific carbon atom for demonstration purposes.
ChatGPT_ (September 25 Version) [Large language model]. https://chat.openai.com
## Other Packages
I also asked python what other packages are available for chemistry and got the following
>There are several Python packages and libraries that are widely used for biochemical synthesis, analysis, and visualization. These packages provide tools for various tasks related to biochemistry, from structural analysis to data visualization. Here are some of the most popular ones:
1. **RDKit**: The RDKit is a comprehensive cheminformatics library that allows you to work with chemical structures, fingerprints, and molecular data. It’s widely used in drug discovery and biochemical analysis. It provides functions for chemical structure handling, substructure searching, and more.
— Website: [https://www.rdkit.org/](https://www.rdkit.org/)
2. **Biopython**: Biopython is a collection of tools, libraries, and scripts for computational biology and bioinformatics. It provides modules for sequence analysis, structural biology, and more.
— Website: [https://biopython.org/](https://biopython.org/)
3. **Bioconductor**: Bioconductor is a collection of R packages for the analysis and comprehension of high-throughput genomic data. It includes a wide range of tools for bioinformatics and statistical analysis.
— Website: [https://www.bioconductor.org/](https://www.bioconductor.org/)
4. **PySCeS**: PySCeS is a package for modeling and simulation of biochemical systems. It provides tools for metabolic control analysis, sensitivity analysis, and dynamic modeling of biochemical networks.
— Website: [http://pysces.sourceforge.net/](http://pysces.sourceforge.net/)
5. **BioPandas**: BioPandas is a library for working with molecular and structural data, similar to Pandas for tabular data. It allows you to manipulate and analyze biological data in a structured way.
— GitHub: [https://github.com/rasbt/biopandas](https://github.com/rasbt/biopandas)
6. **MDAnalysis**: MDAnalysis is a library for analyzing molecular dynamics (MD) simulation trajectories. It’s commonly used for studying protein structures and dynamics.
— Website: [https://www.mdanalysis.org/](https://www.mdanalysis.org/)
7. **PyMOL**: PyMOL is a molecular visualization system that can produce high-quality 3D molecular structures and animations. While it’s not a Python library itself, it has a Python API for scripting and automating tasks.
— Website: [https://pymol.org/](https://pymol.org/)
8. **Matplotlib and Plotly**: These are popular data visualization libraries that can be used for visualizing biochemical data, including plotting protein structures, molecular interactions, and more.
— Matplotlib: [https://matplotlib.org/](https://matplotlib.org/)
— Plotly: [https://plotly.com/python/](https://plotly.com/python/)
9. **NetworkX**: NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. It’s useful for analyzing biological networks such as protein-protein interaction networks and metabolic pathways.
— Website: [https://networkx.github.io/](https://networkx.github.io/)
10. **PySCeNe**: PySCeNe is a Python library for the analysis and visualization of cellular and subcellular compartments in spatial single-cell RNA sequencing (scRNA-seq) data.
– GitHub: [https://github.com/Teichlab/PySCeNe](https://github.com/Teichlab/PySCeNe)
These libraries cover a wide range of tasks in biochemistry, from structural analysis and molecular dynamics to bioinformatics and network analysis. Depending on your specific needs, you can choose the libraries that best suit your projects.
> any more?
ChatGPT
Certainly, there are more Python packages for biochemical synthesis, analysis, and visualization. Here are a few additional libraries and tools that you might find useful:
11. **ChemPy**: ChemPy is a Python library for modeling and simulating chemical kinetics. It’s particularly useful for modeling chemical reactions and reaction networks.
— GitHub: [https://github.com/bjodah/chempy](https://github.com/bjodah/chempy)
12. **COBRApy**: COBRApy is a Python library for constraint-based reconstruction and analysis (COBRA) of metabolic networks. It’s used in systems biology for analyzing and simulating metabolic pathways.
— GitHub: [https://opencobra.github.io/cobrapy/](https://opencobra.github.io/cobrapy/)
13. **Pybel**: Pybel is a Python wrapper for the Open Babel cheminformatics toolkit. It provides tools for chemical file format conversion and molecular descriptor calculations.
— GitHub: [https://github.com/OpenBabel/pybel](https://github.com/OpenBabel/pybel)
14. **BioPython-Structural**: This is an extension of the Biopython library that focuses on structural biology. It allows you to work with protein structures, perform structural alignments, and more.
— GitHub: [https://github.com/biopython/biopython-struct](https://github.com/biopython/biopython-struct)
15. **Chemoinformatics Jupyter Notebooks**: A collection of Jupyter notebooks with examples and tutorials on various chemoinformatics tasks, including chemical data analysis and visualization.
— GitHub: [https://github.com/chemoinformatics/chemoinformatics-jupyter-notebooks](https://github.com/chemoinformatics/chemoinformatics-jupyter-notebooks)
16. **Cytoscape Automation**: Cytoscape is a popular tool for visualizing and analyzing biological networks. Cytoscape Automation provides a Python API for interacting with Cytoscape and automating network analysis.
— GitHub: [https://github.com/cytoscape/automation](https://github.com/cytoscape/automation)
17. **BiGGER**: BiGGER (Biological Graph and General Entity Renderer) is a Python library for visualizing biological networks and pathway diagrams. It’s designed for creating publication-quality figures.
— GitHub: [https://github.com/ryaneus/gets](https://github.com/ryaneus/gets)