Artificial Intelligence

Are Prompt Engineers Really Just Data Scientists?

By Ron Bodkin, co-founder and CEO of ChainML

Are prompt engineers data scientists? Even if you don’t encounter many people (or press) asking this question, the increase of attention for AI and the role of prompt engineers specifically brings to the forefront some mystery and intrigue over this distinction. 

A quick glance at the above quotes and clickbaity headlines should provide insight into the attention, hype, curiosity, and subtle confusion over the concept — and profession — of a prompt engineer. Prompt engineers are tasked with crafting strategic and intuitive questions (i.e., prompts) to help train and guide chatbots to perform specific tasks. Since ChatGPT went viral in the spring of 2023, prompt engineers have emerged from obscurity to become, according to Linkedin data and mainstream media, the next big thing in tech.

How did Prompt engineers undergo such a rapid transformation from a task performed primarily by engineers, researchers, and data scientists to an emerging and distinct profession? Where and how have the lines between these different professions become blurred?

Humans often resort to simple categories or descriptors for things we don't understand. The concept of prompt engineers, whether they are data scientists (or a mix of both), fits firmly into this box — and requires us to step back and read between the code.

In this manner, prompt engineers are similar to data scientists, yet to call the two the same is a broad oversimplification. Dissecting the nuances between the two and how they work together (and separately) — and how they can improve their work via improved tools and processes (hint: more prompting) to build a bedrock for the rapid innovation we're experiencing every day across AI is a valuable and necessary practice.

Why Are Prompt Engineers Trending in the First Place? 

It's not entirely surprising that prompt engineers have become such a trending topic with the viral adoption of chatbots like ChatGPT, Bing Chat, and an increasing rolodex of AI-powered applications and consumer-friendly plug-ins, which appear to average users as magic genies of unlimited knowledge that are unlocked via simple, conversational prompts (hence why it is classified as conversational AI). 

How could a simple text box work such wonders? Despite being often invisible, human moderation provides some accountability and reliability to offset the many ethical and accuracy-focused risks found in generative AI.

Additionally, emphasizing the person responsible for asking a chatbot the right questions to get the ideal answers is something most can make sense of — more than the technical intricacies of what (and who) ultimately powers the incredible outputs of chatbots. 

Prompt Engineers vs. Data Scientists — Digging into the Differences

As their name suggests, prompt engineers are responsible for crafting the initial prompts (i.e., a set of instructions and directional guidelines), which applications like Gmail or GitHub Copilot receive from large language models like GPT-4 through the function of an API. As a result of this process, prompt engineers are intimately involved in building generative AI applications. 

Prompt engineers help design the textual guidelines, questions, or conversation starters that users interact with to eventually get responses from the model. These LLMs then leverage transformers (the in GPT) to read nearly infinite amounts of data (from structured data like text to unstructured data like images) and establish patterns in how different data relate to each other to predict what should follow. Imagine a map: prompt engineers help plot all the potential directions and parameters, which connect the dots from A to B within your conversation and support the model in understanding the data.

In addition to designing the actual prompts, prompt engineers also help control output, refine interactions, and address biases within the models via experimentation and iteration within the prompts themselves. 

Like prompt engineers, data scientists operate deep behind the scenes and within the code that powers applications. As their name implies, data scientists work intimately alongside the sprawling datasets that prompt engineers do, ultimately helping optimize an AI model's performance which helps strengthen the performance of an application. 

However, while prompt engineers typically train models via experimental and iterative prompting, data scientists feed models vast amounts of data and fine-tune their parameters to help the models recognize and learn patterns within the data — which helps them get better at understanding and iterating on prompts. In addition to potentially training models, data scientists can also handle performance evaluation, experimentation, and ethical considerations like prompt engineers— ultimately helping refine the prompts. 

While prompt engineers focus mainly on designing the most effective prompts to ultimately control the output of a chatbot, data scientists leverage machine learning techniques to refine the model's continuous performance via its massive intake of raw data. Both roles are involved in scenario mapping — one language-focused and one data-focused, and together they leverage the capabilities of LLMs to build brilliant AI applications. Below, check out a handy chart (created via ChatGPT) for comparing and contrasting the specific skills utilized by data scientists and prompt engineers.

Skill

Data Scientist

Prompt Engineer

Statistical Analysis

Yes

Yes

Programming

Yes

Yes

Data Wrangling

Yes

Yes

Testing and Evaluation

Yes

Yes

Calling AI Models

Yes

Yes

Training AI Models

Yes

Optional

Data Visualization

Yes

Yes

Big Data

Yes

Yes

Database

Yes

Yes

Empathy and User Focus

Yes

Yes

Problem-Solving

Yes

Yes

Communication

Yes

Less so

You'll notice the skills are nearly identical with two subtle specifications — training AI models and communication. 

So why are we not reading buzzy headlines and job postings on the opportunities (and salaries) for data scientists in the burgeoning field of generative AI? And are these two roles essentially the same if their respective tasks often overlap?

Are prompt engineers just data scientists or rather just, predominantly data scientists with some added nuance and responsibility? No, not entirely.

Reading Between the Code (and Headlines) 

As the chart above demonstrates, both fields share many skills and help get models to the same, ideally successful and fine-tuned end-state. Take collaborating AI agents, for example, which can be used within a generative AI application to help implement technical support for a business. In this scenario, collaboration — both within agents and through the work of prompt engineers and data scientists — improves AI technology to generate a successful outcome for a developer and later for users of their product. Success here isn't merely about the role of a prompt engineer vs. a data scientist but the strength of the prompts themselves and how much of an emphasis is put on planning, breaking down tasks, and reviewing results to enable high-quality outputs. 

Here, the skillsets of prompt engineers and data scientists must interact to thrive. 

While prompt engineers are focused mainly on prompting, they still need to experiment and measure the quality of their work, including creating hypotheses for problems — a role traditionally associated with data scientists. They also still need programming skills. And often prompt engineers need to experiment with different models to find the best one. When building an application with Generative AI Models — whether it's a text-to-image app like Midjourney or a search engine like Bing’s Sydney — prompt engineers must be able to fine-tune models to specialize behavior and create and modify specialized models. Here, prompt engineers often leverage "embeddings" to improve quality or training models to improve the retrieved content to prompt an LLM. 

Here, the lines become even more blurred across different necessary skills and instances between prompt engineers and data scientists, though their end goal — enabling powerful applications — is the same. 

There's also something critical missing from the conversation — a third role without a human face but one that requires human interaction and implementation: collaborating AI agents, which are like the little, albeit mighty AI worker bees that perceive their environment to achieve specific goals and increase the opportunity for control and monitoring within models and applications. 

In generative AI, agents leverage AI models as modular components for specialized tasks – i.e., an agent that queries a database, another to clean and prepare data, a third to generate hypotheses and perform statistical analysis, and a fourth to generate visual or written insights to help inform strategies. 

As an added support layer to help prompt engineers and data scientists to train, run, and test the most effective models, agents — if built correctly, with control and oversight — help enable repeatable outcomes and orchestrate responsive behavior by balancing human oversight and automation. 

In the end, these agents must build from good prompts and fine-tuned data, which requires the work and talents of both parties (prompt engineering and data scientists). 

The Orchestral Symphony of AI 

As you’ve hopefully gathered, an effective AI application isn’t a one-person show, but a symphony of autonomous and human-engineered roles that must work together to achieve an efficient and engaging output that reliably (and responsibly) satisfies its end-goal. While data scientists, prompt engineers, and even collaborating agents perform many essential tasks that deserve recognition and a verifiable industry to nurture performance and even competition, they can’t operate at their best without each other — and without continual, strategic prompting. 

In a society where we often rush to set professions apart in a way that can fuel glorification and fierce division, I find some comfort in the level of collaboration that operates under the hood — and within the code — of our world’s increasingly essential applications.

The views and opinions expressed herein are the views and opinions of the author and do not necessarily reflect those of Nasdaq, Inc.

Other Topics

Technology