MATLAB Help Center
Supported Platform: Linux® only.
This example shows you how to create a standalone MATLAB® MapReduce application using the mcc command and run it against a Hadoop® cluster.
mcc
Goal: Calculate the maximum arrival delay of an airline from the given dataset.
airlinesmall.csv
Airline departure and arrival information from 1987-2008.
To download the airlinesmall.csv file, at the MATLAB command prompt type:
setupExample("matlab/AddKeysValuesExample", pwd)
AddKeysValuesExample.mlx
Start this example by creating a new work folder that is visible to the MATLAB search path.
Before starting MATLAB, at a terminal, set the environment variable HADOOP_PREFIX to point to the Hadoop installation folder. For example:
HADOOP_PREFIX
% setenv HADOOP_PREFIX /usr/lib/hadoop
$ export HADOOP_PREFIX=/usr/lib/hadoop
Note
This example uses /usr/lib/hadoop as directory where Hadoop is installed. Your Hadoop installation directory maybe different.
/usr/lib/hadoop
If you forget setting the HADOOP_PREFIX environment variable prior to starting MATLAB, set it up using the MATLAB function setenv at the MATLAB command prompt as soon as you start MATLAB. For example:
setenv
setenv('HADOOP_PREFIX','/usr/lib/hadoop')
Install the MATLAB Runtime in a folder that is accessible by every worker node in the Hadoop cluster. This example uses /usr/local/MATLAB/MATLAB_Runtime/R2025b as the location of the MATLAB Runtime folder.
/usr/local/MATLAB/MATLAB_Runtime/R2025b
If you don’t have the MATLAB Runtime, you can download it from the website at: list.
Copy the map function maxArrivalDelayMapper.m from /usr/local/MATLAB/R2025b/toolbox/matlab/demos folder to the work folder.
maxArrivalDelayMapper.m
/usr/local/MATLAB/R2025b/toolbox/matlab/demos
function maxArrivalDelayMapper (data, info, intermKVStore) partMax = max(data.ArrDelay); add(intermKVStore,'PartialMaxArrivalDelay',partMax);
For more information, see Write a Map Function.
Copy the reduce function maxArrivalDelayReducer.m from matlabroot/toolbox/matlab/demos folder to the work folder.
maxArrivalDelayReducer.m
matlabroot/toolbox/matlab/demos
matlabroot
function maxArrivalDelayReducer(intermKey, intermValIter, outKVStore) maxVal = -inf; while hasnext(intermValIter) maxVal = max(getnext(intermValIter), maxVal); end add(outKVStore,'MaxArrivalDelay',maxVal);
For more information, see Write a Reduce Function.
Create the directory /user/<username>/datasets on HDFS™ and copy the file airlinesmall.csv to that directory. Here <username> refers to your user name in HDFS.
/user/<username>/datasets
<username>
$ ./hadoop fs -copyFromLocal airlinesmall.csv hdfs://host:54310/user/<username>/datasets
Start MATLAB and verify that the HADOOP_PREFIX environment variable has been set. At the command prompt, type:
>> getenv('HADOOP_PREFIX')
If ans is empty, review the Prerequisites section above to see how you can set the HADOOP_PREFIX environment variable.
ans
Create a new MATLAB script with the name depMapRedStandAlone.m. You will add the code listed in the steps listed below to this script file.
depMapRedStandAlone.m
Create a datastore that points to the airline data in Hadoop Distributed File System (HDFS).
datastore
ds = datastore('hdfs://user/username/datasets/airlinesmall.csv',... 'TreatAsMissing','NA',... 'SelectedVariableNames',{'UniqueCarrier','ArrDelay'});
For more information, see Work with Remote Data.
Configure the application for deployment against Hadoop with default settings.
config = matlab.mapreduce.DeployHadoopMapReducer;
The class matlab.mapreduce.DeployHadoopMapReducer can be used to configure a standalone application based on the Hadoop environment where it is going to be deployed.
matlab.mapreduce.DeployHadoopMapReducer
For example, if you want to specify the location of the MATLAB Runtime on each of the worker nodes on the cluster, include a line of code similar to this:
config = matlab.mapreduce.DeployHadoopMapReducer('MCRRoot','/opt/MATLAB/MATLAB_Runtime/R2025b');
/opt/MATLAB/MATLAB_Runtime
For information on specifying additional cluster specific properties, see matlab.mapreduce.DeployHadoopMapReducer.
Specifying a MATLAB Runtime location as part of the class matlab.mapreduce.DeployHadoopMapReducer will override any MATLAB Runtime location specified during the execution of the standalone application.
Define the execution environment using the mapreducer.
mapreducer
mr = mapreducer(config);
Apply the mapreduce function.
mapreduce
result = mapreduce(... ds,... @maxArrivalDelayMapper,@maxArrivalDelayReducer,... mr,... 'OutputType','Binary', ... 'OutputFolder','hdfs://user/<username>/results/myresults');
An HDFS directory such as .../myresults can be written to only once. If you plan on running your standalone application multiple times against the Hadoop cluster, make sure you delete the .../myresults directory on HDFS prior to each execution. Another option is to change the name of the .../myresults directory in the MATLAB code and recompile the application.
.../myresults
Read the result from the resulting datastore.
myAppResult = readall(result)
Use the mcc command with the -m flag to create a standalone application.
-m
mcc -m depMapRedStandAlone.m
The -m flag creates a standard executable that can be run from a command line. However, the mcc command cannot package the results in an installer.
Run the standalone application from a Linux shell using the following command:
$ ./run_depMapRedStandAlone.sh /usr/local/MATLAB/MATLAB_Runtime/R2025b
/usr/local/MATLAB/MATLAB_Runtime/R2025b is an argument indicating the location of the MATLAB Runtime.
Prior to executing the above command, verify that the HADOOP_PREFIX environment variable is set in the Terminal by typing:
$ echo $HADOOP_PREFIX
echo
Your application will fail to execute if the HADOOP_PREFIX environment variable is not set.
You will see the following output:
myAppResult = Key Value _________________ ______ 'MaxArrivalDelay' [1014]
To learn more about using the map and reduce functions, see Getting Started with MapReduce.
map
reduce
Complete code for the standalone application depMapRedStandAlone can be found here:
depMapRedStandAlone
%% Create datastore ds = datastore(... 'hdfs://user/username/datasets/airlinesmall.csv',... 'TreatAsMissing','NA',... 'SelectedVariableNames',{'UniqueCarrier','ArrDelay'}); %% Configure application for deployment against Hadoop with default settings config = matlab.mapreduce.DeployHadoopMapReducer; %% Define the execution environment mr = mapreducer(config); %% Apply the mapreduce function result = mapreduce(... ds,... @maxArrivalDelayMapper,@maxArrivalDelayReducer,... mr,... 'OutputType','Binary', ... 'OutputFolder','hdfs://user/username/results/myresults'); %% Read the result from the resulting datastore myAppResult = readall(result)
datastore | TabularTextDatastore | KeyValueDatastore | matlab.mapreduce.DeployHadoopMapReducer | mcc
TabularTextDatastore
KeyValueDatastore
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Seleziona un sito web
Seleziona un sito web per visualizzare contenuto tradotto dove disponibile e vedere eventi e offerte locali. In base alla tua area geografica, ti consigliamo di selezionare: .
Puoi anche selezionare un sito web dal seguente elenco:
Come ottenere le migliori prestazioni del sito
Per ottenere le migliori prestazioni del sito, seleziona il sito cinese (in cinese o in inglese). I siti MathWorks per gli altri paesi non sono ottimizzati per essere visitati dalla tua area geografica.
Americhe
Europa
Asia-Pacifico
Contatta l’ufficio locale