Getting Started
Welcome to the Creao dashboard! This guide will help you quickly get started with setting up and using the pipeline for large-scale data generation.
What is a Pipeline?
The Creao pipeline is a framework for building and managing workflows to generate synthetic data using Large Language Models (LLMs). By using and connecting the provided components available to you, each designed to handle specific tasks, you can create customized and streamlined data generating pipelines adapted to your own use cases.
Components Overview
Creao supports several components to perform different tasks:
- Data Component: Handles Hugging Face data import and processing.
- LLM Component: Processes data based on prompts and generates structured outputs.
- Filter Component: Filters data based on specified conditions.
- Dedup Component: Removes duplicate entries from data.
Building Your Pipeline
To build your pipeline:
-
Define the Input Dataset:
- Specify the input dataset path of the
input_data
component. - Click "Pipeline Config" and input the Hugging Face dataset path.
- Click "Update Dataset" to load the dataset.
- Specify the input dataset path of the
-
Add New Components:
- Custom variables (e.g.,
extractInterests
,rewriteQuestions
, etc.) can be added to the pipeline. - Navigate to the "Add Component" button to add new components to the pipeline.
- Configure the components based on your requirements in the provided fields on the right side of the dashboard.
- Custom variables (e.g.,
-
Sumbit the Pipeline:
- Click the "Submit Pipeline" button to run the pipeline.
Data & Variables
Creao interacts with three types of data and variables:
-
Output from the First Preceding LLM Component:
- Access output from the initial LLM component in the pipeline, tracing back if necessary.
-
Custom Pipeline Variables:
- Define global variables in the "Add New Variable" section, accessible to all components.
-
Dataset Variables:
- Define variables from specific columns of your input dataset. These variables are available to all components after selecting the parse option.
Conclusion
With these steps, you can set up and customize your Creao Pipeline for efficient data processing! For more detailed information about each component, please refer to the rest of the documentation.
If you have any questions or need further assistance, feel free to reach out!