I have some trouble to convert my data.frame from a wide table to a long table. At the moment it looks like this:

Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246

Now I like to transform this data.frame into a long data.frame. Something like this:

Code Country        Year    Value
AFG  Afghanistan    1950    20,249
AFG  Afghanistan    1951    21,352
AFG  Afghanistan    1952    22,532
AFG  Afghanistan    1953    23,557
AFG  Afghanistan    1954    24,555
ALB  Albania        1950    8,097
ALB  Albania        1951    8,986
ALB  Albania        1952    10,058
ALB  Albania        1953    11,123
ALB  Albania        1954    12,246

I have looked and tried it already with the melt() and the reshape() functions as some people were suggesting to similar questions. However, so far I only get messy results.

If it is possible I would like to do it with the reshape() function since it looks a little bit nicer to handle.

1 upvote
  flag
Don't know if that was the problem, but the functions in the reshape package are melt and cast (and recast.) – Eduardo Leoni
upvote
  flag
And the reshape package has been superseded by reshape2. – 42-
2 upvote
  flag
And now reshape2 has been superseded by tidyr. – drhagen

5 Answers 11

up vote 53 down vote accepted

reshape() takes a while to get used to, just as melt/cast. Here is a solution with reshape, assuming your data frame id called d:

reshape(d, direction="long", varying=list(names(d)[3:7]), v.names="Value", 
        idvar=c("Code","Country"), timevar="Year", times=1950:1954)
1 upvote
  flag
.....works perfectly fine, thanks. And is it possible to specify already in the reshape() command the resulting rownames? Or do I have to do it in a second step eg: rownames(data) <- c(1:length(rownames(data))) – mropa

Using reshape package:

#data
x <- read.table(textConnection(
"Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246"), header=TRUE)

library(reshape)

x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")
x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))

Three alternative solutions:

1: With reshape2

library(reshape2)
long <- melt(wide, id.vars = c("Code", "Country"))

giving:

   Code     Country variable  value
1   AFG Afghanistan     1950 20,249
2   ALB     Albania     1950  8,097
3   AFG Afghanistan     1951 21,352
4   ALB     Albania     1951  8,986
5   AFG Afghanistan     1952 22,532
6   ALB     Albania     1952 10,058
7   AFG Afghanistan     1953 23,557
8   ALB     Albania     1953 11,123
9   AFG Afghanistan     1954 24,555
10  ALB     Albania     1954 12,246

Some alternative notations that give the same result:

# you can also define the id-variables by column number
melt(wide, id.vars = 1:2)

# as an alternative you can also specify the measure-variables
# all other variables will then be used as id-variables
melt(wide, measure.vars = 3:7)
melt(wide, measure.vars = as.character(1950:1954))

2: With data.table

You can use the same melt function as in the reshape2 package (which is an extended & improved implementation). melt from data.table has also more parameters that the melt from reshape2. You can for exaple also specify the name of the variable-column:

library(data.table)
long <- melt(setDT(wide), id.vars=c("Code","Country"), variable.name="year")

Some alternative notations:

melt(setDT(wide), id.vars = 1:2, variable.name = "year")
melt(setDT(wide), measure.vars = 3:7, variable.name = "year")
melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")

3: With tidyr

library(tidyr)
long <- wide %>% gather(year, value, -c(Code, Country))

Some alternative notations:

wide %>% gather(year, value, -Code, -Country)
wide %>% gather(year, value, -1:-2)
wide %>% gather(year, value, -(1:2))
wide %>% gather(year, value, -1, -2)
wide %>% gather(year, value, 3:7)
wide %>% gather(year, value, `1950`:`1954`)

If you want to exclude NA values, you can add na.rm = TRUE to the melt as well as the gather functions.


Another problem with the data is that the values will be read by R as character-values (as a result of the , in the numbers). You can repair that with gsub and as.numeric:

long$value <- as.numeric(gsub(",", "", long$value))

Or directly with data.table or dplyr:

# data.table
long <- melt(setDT(wide),
             id.vars = c("Code","Country"),
             variable.name = "year")[, value := as.numeric(gsub(",", "", value))]

# tidyr and dplyr
long <- wide %>% gather(year, value, -c(Code,Country)) %>% 
  mutate(value = as.numeric(gsub(",", "", value)))

Data:

wide <- read.table(text="Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246", header=TRUE, check.names=FALSE)
upvote
  flag
great answer, just one more tiny reminder : do not put any variables other than id andtime in your data frame, melt could not tell what you want to do in this case. – Jason Goal
1 upvote
  flag
@JasonGoal Could you elaborate on that? As I'm interpreting you comment, it shouldn't be a problem. Just specify both the id.vars and the measure.vars. – Jaap
upvote
  flag
,then that's good for me, don't know id.vars and the measure.vars can be specified in the first alternative,sorry for the mess, its my fault. – Jason Goal
upvote
  flag
@Jaap I liked your answer. I need to study it – nik

Here is another example showing the use of gather from tidyr. You can select the columns to gather either by removing them individually (as I do here), or by including the years you want explicitly.

Note that, to handle the commas (and X's added if check.names = FALSE is not set), I am also using dplyr's mutate with parse_number from readr to convert the text values back to numbers. These are all part of the tidyverse and so can be loaded together with library(tidyverse)

wide %>%
  gather(Year, Value, -Code, -Country) %>%
  mutate(Year = parse_number(Year)
         , Value = parse_number(Value))

Returns:

   Code     Country Year Value
1   AFG Afghanistan 1950 20249
2   ALB     Albania 1950  8097
3   AFG Afghanistan 1951 21352
4   ALB     Albania 1951  8986
5   AFG Afghanistan 1952 22532
6   ALB     Albania 1952 10058
7   AFG Afghanistan 1953 23557
8   ALB     Albania 1953 11123
9   AFG Afghanistan 1954 24555
10  ALB     Albania 1954 12246

Since this answer is tagged with , I felt it would be useful to share another alternative from base R: stack.

Note, however, that stack does not work with factors--it only works if is.vector is TRUE, and from the documentation for is.vector, we find that:

is.vector returns TRUE if x is a vector of the specified mode having no attributes other than names. It returns FALSE otherwise.

I'm using the sample data from @Jaap's answer, where the values in the year columns are factors.

Here's the stack approach:

cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))
##    Code     Country values  ind
## 1   AFG Afghanistan 20,249 1950
## 2   ALB     Albania  8,097 1950
## 3   AFG Afghanistan 21,352 1951
## 4   ALB     Albania  8,986 1951
## 5   AFG Afghanistan 22,532 1952
## 6   ALB     Albania 10,058 1952
## 7   AFG Afghanistan 23,557 1953
## 8   ALB     Albania 11,123 1953
## 9   AFG Afghanistan 24,555 1954
## 10  ALB     Albania 12,246 1954

Not the answer you're looking for? Browse other questions tagged or ask your own question.