Skip to content

Commit

Permalink
Merge pull request #1066 from JohnSnowLabs/chore/final_website_updates
Browse files Browse the repository at this point in the history
update the README.md file
  • Loading branch information
chakravarthik27 authored Jul 16, 2024
2 parents 2ee2c22 + 7f0f46d commit 1dbc655
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 2 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,9 @@ You can check out the following LangTest articles:
| [**Testing the Robustness of LSTM-Based Sentiment Analysis Models**](https://medium.com/john-snow-labs/testing-the-robustness-of-lstm-based-sentiment-analysis-models-67ed84e42997) | Explore the robustness of custom models with LangTest Insights.|
| [**LangTest Insights: A Deep Dive into LLM Robustness on OpenBookQA**](https://medium.com/john-snow-labs/langtest-insights-a-deep-dive-into-llm-robustness-on-openbookqa-ab0ddcbd2ab1) | Explore the robustness of Language Models (LLMs) on the OpenBookQA dataset with LangTest Insights.|
| [**LangTest: A Secret Weapon for Improving the Robustness of Your Transformers Language Models**](https://medium.com/john-snow-labs/langtest-a-secret-weapon-for-improving-the-robustness-of-your-transformers-language-models-9693d64256cc) | Explore the robustness of Transformers Language Models with LangTest Insights.|

| [**Mastering Model Evaluation: Introducing the Comprehensive Ranking & Leaderboard System in LangTest**](https://medium.com/john-snow-labs/mastering-model-evaluation-introducing-the-comprehensive-ranking-leaderboard-system-in-langtest-5242927754bb) | The Model Ranking & Leaderboard system by John Snow Labs' LangTest offers a systematic approach to evaluating AI models with comprehensive ranking, historical comparisons, and dataset-specific insights, empowering researchers and data scientists to make data-driven decisions on model performance. |
| [**Evaluating Long-Form Responses with Prometheus-Eval and Langtest**](https://medium.com/john-snow-labs/evaluating-long-form-responses-with-prometheus-eval-and-langtest-a8279355362e) | Prometheus-Eval and LangTest unite to offer an open-source, reliable, and cost-effective solution for evaluating long-form responses, combining Prometheus's GPT-4-level performance and LangTest's robust testing framework to provide detailed, interpretable feedback and high accuracy in assessments. |
| [**Ensuring Precision of LLMs in Medical Domain: The Challenge of Drug Name Swapping**](https://medium.com/john-snow-labs/ensuring-precision-of-llms-in-medical-domain-the-challenge-of-drug-name-swapping-d7f4c83d55fd) | Accurate drug name identification is crucial for patient safety. Testing GPT-4o with LangTest's **_drug_generic_to_brand_** conversion test revealed potential errors in predicting drug names when brand names are replaced by ingredients, highlighting the need for ongoing refinement and rigorous testing to ensure medical LLM accuracy and reliability. |



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,6 @@ The following table gives an overview of the different tutorial notebooks. In th
| **Generic API-Based Model**: In this section, we discussed how to test API-based models hosted using Ollama, vLLM, and other tools. | Web |Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/llm_notebooks/Generic_API-Based_Model_Testing_Demo.ipynb) |
| **Data Augmenter**: In this Notebook, we can allows for streamlined and harness-free data augmentation, making it simpler to enhance your datasets and improve model robustness. | - |NER | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Data_Augmenter_Notebook.ipynb) |
| **Multi-Dataset Prompt Configs**: In this Notebook, we discussed about optimized prompt handling for multiple datasets, allowing users to add custom prompts for each dataset, enabling seamless integration and efficient testing. | OpenAI |Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/MultiPrompt_MultiDataset.ipynb) |
| **Multi-Model, Multi-Dataset**: In this Notebook, we discussed about testing on multiple models with multiple datasets, allowing users to allows for comprehensive comparisons and performance assessments in a streamlined manner. | OpenAI |Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Multi_Model_Multi_Dataset.ipynb.ipynb) |
| **Multi-Model, Multi-Dataset**: In this Notebook, we discussed about testing on multiple models with multiple datasets, allowing users to allows for comprehensive comparisons and performance assessments in a streamlined manner. | OpenAI |Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Multi_Model_Multi_Dataset.ipynb) |
| **Evaluation_with_Prometheus_Eval**: In this Notebook, we disscussed about integrating the Prometheus model to langtest brings enhanced evaluation capabilities, providing more detailed and insightful metrics for model performance assessment. | OpenAI |Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Evaluation_with_Prometheus_Eval.ipynb) |
| **Misuse_Test_with_Prometheus_evaluation**: In this Notebook, we discussed about new safety testing features to identify and mitigate potential misuse and safety issues in your models | OpenAI |Question-Answering | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/JohnSnowLabs/langtest/blob/main/demo/tutorials/misc/Misuse_Test_with_Prometheus_evaluation.ipynb) |

0 comments on commit 1dbc655

Please sign in to comment.